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MULTIVARIATE ANALYSIS AND JACOBI ENSEMBLES: 
LARGEST EIGENVALUE, TRACY WIDOM LIMITS 
AND RATES OF CONVERGENCE 1 

By Iain M. Johnstone 
Stanford University 

Let A and B be independent, central Wishart matrices in p 
variables with common covariance and having m and n degrees of 
freedom, respectively. The distribution of the largest eigenvalue of 
(A + B) B has numerous applications in multivariate statistics, but 
is difficult to calculate exactly. Suppose that m and n grow in propor- 
tion to p. We show that after centering and scaling, the distribution is 
approximated to second-order, 0(p~ 2//3 ), by the Tracy-Widom law. 
The results are obtained for both complex and then real- valued data 
by using methods of random matrix theory to study the largest eigen- 
value of the Jacobi unitary and orthogonal ensembles. Asymptotic 
approximations of Jacobi polynomials near the largest zero play a 
central role. 

1. Introduction. It is a striking feature of the classical theory of multi- 
variate statistical analysis that most of the standard techniques — principal 
components, canonical correlations, multivariate analysis of variance 
(MAN OVA), discriminant analysis and so forth — are founded on the eigen- 
analysis of covariance matrices. 

If, as is traditional, one assumes that the observed data follow a mul- 
tivariate Gaussian distribution, then that theory builds on the eigenvalues 
and eigenvectors of one or two matrices following the Wishart distribution. 
Since the "single Wishart" problem can be viewed as a limiting case of the 
"double Wishart" one, the fundamental setting is that of the generalized 
eigenproblem det[i? — 9 (A + B)] = 0. In the idioms of MANOVA, A repre- 
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sents the "within groups" or "error" covariance matrix, and B the "between 
groups" or "hypothesis" covariance. 

In each of the standard techniques, there is a conventional "null 
hypothesis" — independence, zero regression, etc. Corresponding test statis- 
tics may use either the full set of eigenvalues, as, for example, in the like- 
lihood ratio test, or simply the extreme eigenvalues, as in the approach to 
inference advanced by S. N. Roy. 

This paper focuses on the largest eigenvalue, or "latent root," and in par- 
ticular on its distribution under the null hypothesis, in other words when the 
two Wishart matrices A and B are independent, central, and have common 
covariance matrix. 

Even under the assumption of Gaussian data, the null distribution of the 
largest root is difficult to work with. It is expressed in terms of a hypergeo- 
metric function of matrix argument, with no general and simple closed form. 
It depends on three parameters — the common dimension of the two Wishart 
matrices and their respective degrees of freedom. Traditional textbooks have 
included tables of critical points which, due to the three parameters, can run 
up to twenty-five pages [Morrison (2005), Timm (1975)]. Traditional soft- 
ware packages have often used a one-dimensional F distribution approxima- 
tion that can be astonishingly inaccurate for dimensions greater than two 
or three. Recently, as will be reviewed below, some exact algorithms have 
been made available, but they are not yet in wide use. One can speculate 
that the use of largest root tests has been limited in part by the lack of a 
simple, serviceable approximation. 

The goal of this paper is to provide such an approximation, which turns 
out to be expressed in terms of the Tracy- Widom distribution F\ of random 
matrix theory. This distribution is free of parameters, and can be tabulated 
or calculated on the fly; it plays here a role analogous to that of the standard 
normal distribution <I> in central limit approximations. The three Wishart 
parameters appear in the centering and scaling constants for the largest 
eigenvalue, for which we give readily computable formulas. 

The approximation is an asymptotic one, developed using the models and 
techniques of random matrix theory in which the dimension p increases to 
infinity, and the degrees of freedom parameters grow in proportion to p. A 
pleasant surprise is that the approximation has a "second-order" accuracy; 
in that sense loosely reminiscent of the ^approximation to normal. The 
traditional percentage points in the upper tail of the null distribution — 
90%, 95%, etc. — are reasonably well approximated for p as small as 5. In a 
companion paper Johnstone (2009), it is argued that over the entire range 
of the parameters (i.e., p as small as 2), the Tracy- Widom approximation 
can yield a first screening of significance level for the largest root test that 
may be adequate in many, and perhaps most, applied settings. 



LARGEST EIGENVALUE IN MULTIVARIATE ANALYSIS 



3 



Some words about the organization of the paper. The remainder of this 
Introduction develops the double Wishart setting and states the approxima- 
tion result, first for real- valued data, and then for complex- valued data, the 
latter involving the Tracy-Widom F<i distribution. Section 2 collects some 
of the statistical settings to which the Tracy-Widom approximation applies, 
along with a "dictionary" that translates the result into each setting. 

The remainder of the paper develops the proofs, using methods of random 
matrix theory (RMT). Section 3 reformulates the results in the language of 
RMT, to say that the scaled Jacobi unitary and orthogonal ensembles con- 
verge to Tracy-Widom at the soft upper edge. Section 4 gives a detailed 
outline of the proof, noting points of novelty. As is conventional, the unitary 
(complex) case is treated first (Section 7), and then used as a foundation 
for the orthogonal (real) setting of primary interest in statistics (Section 8). 
Everything is based on Plancherel-Rotach asymptotics of Jacobi polynomi- 
als near their largest zero; this is developed in Sections 5 and 6 using the 
Liouville-Green approach to the corresponding differential equation. Some 
of the results of this paper were announced in Johnstone (2007). 

1.1. Statement of results. Let X be an m x p normal data matrix: each 
row is an independent observation from N p (0, E). A p x p matrix A = X'X 
is then said to have a Wishart distribution A ~ W P (S, m). Let B ~ W P (E, n) 
be independent of A~ W p (S,m). Assume that m>p; then A -1 exists and 
the nonzero eigenvalues of A~ l B generalize the univariate F ratio. The 
scale matrix £ has no effect on the distribution of these eigenvalues, and so 
without loss of generality suppose that T, = I. 

The matrix analog of a Beta variate is based on the eigenvalues of (A + 
B)~ 1 B, and leads to 

Definition 1 [Mardia, Kent and Bibby (1979), page 84]. Let A ~ W P (I, m) 
be independent of B ~ W p (I,n), where m>p. Then the largest eigenvalue 
6 of (A + B)~ 1 B is called the greatest root statistic and a random variate 
having this distribution is denoted Oi(p,m,n), or #i iP for short. 

Since A is positive definite, < 6 < 1. Equivalently 9i(p,m,n) is the 
largest root of the determinantal equation 



Specific examples will be given below, but in general the parameter p refers 
to dimension, m to the "error" degrees of freedom and n to the "hypothesis" 
degrees of freedom. Thus m + n represents the "total" degrees of freedom. 
The greatest root distribution has the property 



(1) 



det[B-6(A + B)] = 0. 



(2) 



8\ (p, m, n) = 6\ (n, m + n — p, p) 
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useful in particular in the case when n < p [e.g., Mardia, Kent and Bibby 
(1979), page 84]. 

Assume p is even and that p,m = m{p) and n = n(p) — > oo together in 
such a way that 

/ v , minfp, n) , p 

(3 lim K — -L > 0, lim — < 1. 

p^oo m + n p^oo m 

A consequence of our main result, stated more completely below, is that 
with appropriate centering and scaling, the logit transform W p = logit = 
log(#i i?3 /(l — #i, p )) is approximately Tracy- Widom distributed: 

(4) W ?-^ 4 Z^ Fl . 

a p 

The distribution F\ was found by Tracy and Widom (1996) as the limiting 
law of the largest eigenvalue of a p x p Gaussian symmetric matrix; fur- 
ther information on F\ is reviewed, for example, in Johnstone (2001). Its 
appearance here is an instance of the universality properties expected for 
largest eigenvalue distributions in random matrix theory [e.g., Deift (2007), 
Deift and Gioev (2007), Deift et al. (2007)]. 

The centering and scaling parameters are given by 

f ip + j\ 
H P = 21ogtanl — ^— I, 

(5) 

j3 _ 16 1 

p (m + n — l) 2 sin 2 (( / 9 + 7) sin sin 7' 

where the angle parameters 7, ip are defined by 

9 /7\ min(p, n) — 1/2 



2/ m + n — 1 

(6) 

9 / ip\ max(p,n) — 1/2 
shr ' 



2 / m + n — 1 

As will be discussed later, the "correction factors" of — ^ and —1 yield a 
second-order rate of convergence that has important consequences for the 
utility of the approximation in practice. Indeed, our main result can be 
formulated as follows. 

Theorem 1. Assume that m(p),n(p) -^00 as p — > 00 through even val- 
ues of p according to (3). For each so £ there exists C > such that for 
s > s , 

\P{W P <n p + a p s} - F 1 (s)\ < Cp- 2 ^e~ s l 2 . 
Here C depends on (7, if) and also on sq if sq < 0. 
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1.2. Exact expressions. Assume that 777,77 >p and that A ~ Wp (7, 777.) 
independently of 7? ~ W p (I,n). The joint density of the eigenvalues 1 > 
9\ > 9 2 > ■ ■ ■ > 9 P > of (A + B)- l B, or equivalently, of the roots of det[5 - 
#(^4 + B)] = 0, simultaneously derived in 1939 by Fisher, Girshick, Hsu, 
Mood and Roy, is given by Muirhead (1982), page 112: 



(7) f(9) = Cl f[(i - e^-p-^e^-^ 2 f[ \e t - e 3 \. 

i=l i<j 

The normalizing constant c\ =cx(p, m,n) involves the multivariate gamma 
function; we shall not need it here. 

Exact evaluation of the marginal distribution of the largest root 9\ is 
not a simple matter. Constantine (1963) showed that the marginal distri- 
bution could be expressed in terms of a hypergeometric function of matrix 
argument. Let t= (n — p — l)/2. Then 

(8) P{e , p < x]=C2 4^ Fl ^,- t ,?l±l±l-, x i 

When t is a nonnegative integer, there is a terminating series [Koev (n.d.), 
Muirhead (1982), page 483, Khatri (1972)] in terms of zonal polynomials 
C K : 

(9) p{9 1 , p < X }=x^y: e (m/2)KCK J (1 ~ x)/) , 

k=0 rehfc,Ki<t 

where k h k signifies that k = (k\, . . . , K n ) is a partition of k, and (?)« 
is a generalized hypergeometric coemcient. Further details and definitions 
may be found in Koev (n.d.) and Muirhead (1982). Johnstone (2009) lists 
further references in the literature developing tables of the distribution of 

9\ (j), 777, n). 

Recently, Koev and Edelman (2006) have exploited recursion relations 
among Jack functions to develop efficient evaluations of hypergeometric 
functions of matrix argument. Current MATLAB software implementations 
(Koev, private communication) allow convenient — up to 1-sec computation 
time — evaluation of (8) for m,n,p < 17, and (9) for 777,77, p < 40 when t is 
integer. 

1.3. Numerical illustrations. Table 1 and Figure 1 show results of some 
simulations to test the Tracy-Widom approximation. A companion paper 
[Johnstone (2009)] has further information on the quality of the distribu- 
tional approximation. 

To the left of the vertical line are three situations in which m = 8p and 
77 = 2p, that is, where the error and hypothesis degrees of freedom are com- 
fortably larger than dimension. In the second setting, to the right of the line, 
this is no longer true: m = 2p and n = p. 
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Table 1 

First column shows the percentiles of the F\ limit distribution corresponding to fractions 
in second column. Next three columns show estimated cumulative probabilities for 9i in 
R = 10,000 repeated draws from the two Wishart setting of Definition 1, with indicated 
values of (p,m,n) . The following three columns show estimated cumulative probabilities 
for w — log (9/(1 — 6) again in R = 10,000 draws with the indicated values of (p,m,n). 
Final column gives approximate standard errors based on binomial sampling. Bold font 
highlights some conventional significance levels. The Tracy-Widom distribution F\ was 
evaluated on a grid of 121 points —6(0.1)6 using the Mathematica package p2Num written 
by Craig Tracy. Remaining computations were done in MATLAB, with percentiles obtained 
by inverse interpolation, and using randnO for normal variates and norm() to evaluate 
the largest eigenvalue of the matrices appearing in Definition 1 
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In the first setting, the largest eigenvalue distribution is concentrated 
around /j,q « 0.5 and with scale oq small enough that the effect of the bound- 
ary at 1 is hardly felt. In this setting, the logit transform w = log 9/(1 — 6) is 
less important for improving the quality of the approximation. Indeed, the 
three first columns show the result of using 

l ie = T~, — 77"' a e = A t e( 1 ~ ^e)cr p . 
1 + e^p 

1.4. Complex-valued data. Data matrices X based on complex-valued 
data arise frequently, for example, in signal processing applications [e.g., 
Tulino and Verdu (2004)]. If the rows of X are drawn independently from a 
complex normal distribution CN(fj,,T,) [see, e.g., James (1964), Section 7], 
then we say A = X'X ~ CW P (S, n). Here X' denotes the conjugate transpose 
of X. 

In parallel with the real case definition, if A ~ CW P (I, m) and B ~ CW P (I, n) 
are independent, then the joint density of the eigenvalues 1 > 9\ > 02 > • • • > 
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Fig. 1. First panel: probability plots of R = 10,000 observed replications of 8±, largest 
root of (1), for p = 20, n = 40, m = 160. 27iat is, i/ie 10,000 ordered observed values of Q\ 
are plotted against F{~ ((« — 0.5) /-R); i = 1, . . . , R- The vertical lines show 1st, 95th and 
99th percentiles. The dotted line is the 45° line of perfect agreement of empirical law with 
asymptotic limit. Second panel: same plots for p = n = 50, m = 100 , but now on logit scale, 
plotting Wp = log(0i/(l-0i)). 



0p>0o£(A + B)~ l B, or equivalently, the roots of det[5 - 6 (A + B)] = 0, 
is given, for example, by James (1964), 

(io) f{e) = cf{{i-e i r^er p J{{9 i -e j f. 

i=l i<j 

The largest eigenvalue 8 C (p,m,n) of (A + B)~ 1 B is called the greatest 
root statistic, with distribution 6 C (p,m,n). The property (2) carries over to 
the complex case. 

Again let W c = logit 9% = log(^/(l - B°)). 

Theorem 2. Assume that m(p),n(p) — > oo as p — > oo according to (3). 
For each sq € R, there exists C > such that for s> sq, 

\P{WC + o-Cs) - F 2 {s)\ < Cp- 2 ' z e- s l 2 . 

Here C depends on (7, ip) and also on sq if sq < 0. 

The limiting distribution is now the unitary Tracy-Widom distribution 
[Tracy and Widom (1994)]. To describe the complex centering and scaling 
constants, we introduce a parameterization basic to the paper: 

N = mm(n,p), a = m — p, (3=\n — p\. 

Then fjp,a use weighted averages based on the parameter sets (N,a,f3) 
and (N - l,a,/3): 

/* = -1 ~ -1 » V ) =2\ T N + T N-l)i 
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where 

w N = 21ogtanl - I , 

o 16 1 

f __ i 

N (2N + a + /3 + l) 2 sm 2 (ip N + jn) sin(p N s'mj N 

and 

N+l/2 
2N + a + (3 + 1 ' 

N + 13 + 1/2 
2N + a + (3 + 1 ' 

Quantities w^-i, ^jv-i are based on y?7v-i ; 77V-i with Af — 1 substituted 
everywhere for N, but with a,/3 unchanged. 2 

The remarks made earlier about exact expressions for largest eigenvalue 
distributions have analogs in the complex case — see the references cited 
earlier, Dumitriu and Koev (2008) and determinantal identities (2.10) and 
(2.11) in Koev (n.d.). 

1.5. Remark on software. Given a routine to compute the Tracy- Widom 
distribution, it is a simple matter to code and use the formulas given in 
this paper. Some references to then-extant software were given in Johnstone 
(2007). Further detail is planned for the completed version of Johnstone 
(2009). 

2. Related statistical settings and implications. In the first part of this 
section, we list five common settings in multivariate statistics to which the 
largest eigenvalue convergence result applies, along with the parameteriza- 
tions appropriate to each. 

2.1. Double Wishart models. 

2.1.1. Canonical correlation analysis. Suppose that there are n observa- 
tions on each of p + q variables. For definiteness, assume that p<q. The first 
p variables are grouped into an n x p data matrix X = [xi X2 • • • x p ] and the 
last q into n x q matrix Y = [yi y2 • • • y 9 ] . Write Sxx = X T X, Sxy = X T Y 
and Syy = Y T Y for the cross-product matrices. Canonical correlation anal- 
ysis (CCA), or more precisely, the zero- mean version of CCA, seeks the 



sin 



sm 



1N_ 

2 
2 



2 This use of the notation wn, wn is local to this Introduction and the Remark con- 
cluding Section 7.1 and not to be confused with other uses in the detailed proofs. 
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linear combinations a T x and b T y that are most highly correlated, that is, to 
maximize 

(11) r = Corr(a T x, b T y) - 



\J a? ' Sxxa\fb 7 Syyb 

This leads to a maximal correlation r\ and associated canonical vectors 
eq and b±, usually each taken to have unit length. The procedure may be 
iterated, restricting the search to vectors orthogonal to those already found: 

rfc = ma,x{a T Sxyb ■ a T Sxxa = b T Syyb = 1, and 
a T Sxxdj = b T Syybj = 0, for 1 < j < k}. 

The successive canonical correlations ri > T2 > • • • > r p > may be found as 
the roots of the determinantal equation 

(12) det(S XY Sy Y S YX - r 2 S XX ) = 

[see, e.g., Mardia, Kent and Bibby (1979), page 284]. A typical question in 
application is then how many of the are significantly different from zero. 

Substitute the cross-product matrix definitions into the CCA determinan- 
tal equation (12) to obtain det(X T Y(Y T Y)- 1 Y T X -r 2 X T X) = 0. Let P de- 
note the n x n orthogonal projection matrix Y(Y T Y)~ 1 Y T and = I — P 
its complement. Then with B = X T PX and A = X T P^X, (12) becomes 

(13) det(B-r 2 (A + B)) = 0. 

Now assume that Z = [X Y] is an n x (p + q) normal data matrix with 
mean zero. The covariance matrix is partitioned 

^YX Syy 

Under these Gaussian assumptions, the X and Y variable sets will be in- 
dependent if and only if S^y = 0. This is equivalent to asserting that the 
population canonical correlations all vanish: p\ = ■ ■ • = p p = 0. 

The canonical correlations (pi, . . . ,p p ) are invariant under block diagonal 
transformations (xi,yi) — > (Bxi,Cyi) of the data (for B and C nonsingular 
p x p and q x. q matrices, resp.). It follows that under the null hypothesis 
Hq : Y>xy = 0, the distribution of the canonical correlations can be found 
(without loss of generality) by assuming that T^xx = Ip and Syy = I q . In 
this case, the matrices A and B of (13) are independent with B ~ W p (q,I) 
and A ~ W p (n — q,I). 

From the definition, the largest squared canonical correlation 9\ = r\ has 
the 9(p, n — q,q) distribution under the null hypothesis T,xy = 0. 
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Mean correction. In practice, it is more common to allow each variable 
to have a separate, unknown mean. One forms the variable means X\ = 
n_1 J2k=i x i,k and replaces Xj by Xj — Xjl, and similarly for the second set of 
variables yj. The entries Sxy, etc. in (11) are now blocks in the partitioned 
sample covariance matrix: if P c = I n — n~ 1 ll T , then 

Sxy = (P C Y) T (P C X) = Y T P C X, S X x = X T P C X, etc. 

For the distribution theory, suppose that Z = [X Y] is an n x (p + q) 
normal data matrix with mean (fix «y) and covariance S. Applying mean- 
corrected CCA as above, then under Sxy = 0, the largest squared canonical 
correlation 6\ = r\ has distribution 6i(p, n' — q,q), where n 1 = n — 1. Indeed, 
let H' be the upper (n — 1) x n block of an orthogonal matrix with nth row 
equal to n~ 1//2 l T . Then Z' = H'Z turns out [e.g., Mardia, Kent and Bibby 
(1979), page 65] to be a normal data matrix with mean 0, covariance £ 
and sample size n — 1, to which our mean-zero discussion above applies. 
Since H' T H' = P c , the mean- zero prescription (11) applied to [X' Y'\ yields 
the same canonical correlations as does the usual mean centered approach 
applied to [X Y}. 

2.1.2. Angles and distances between subspaces. The cosine of the angle 
between two vectors u, v € W 1 is given by 

a(u,v) = |u T u|/(||ii||2||v||2)- 

Consequently, (11) becomes r = a(Xa,Yb). Writing X and y for the sub- 
spaces spanned by the columns of X and Y, then the canonical correlations 
rfc = cosi?fc are just the cosines of the principal angles between X and y 
[e.g., Golub and Van Loan (1996), page 603]. 

The closeness of two equidimensional subspaces can be measured by the 
largest angle between vectors in the two spaces: 

d(X,y) =mma{Xa,Yb) = r p , 

a,b 

the smallest canonical correlation. This is equivalent to the 2-norm of the dis- 
tance between orthoprojections on X and 3^ : \\Px — Py\\2 = sin# p = ^1 — r 2 . 

Random subspaces. A standard way to realize a draw from the uniform 
(Haar) distribution on the Grassmann manifold of p-dimensional subspaces 
of W 1 is to let X = span(X) , with the entries of the n x p matrix X being i.i.d. 
standard Gaussian. If X nxp and Y nxq are two such independent Gaussian 
matrices, then the squared cosines of the principal angles between X and 
y are given by the roots of (13), with A~ W p (n — q,I) independently of 
B ~ W p (q,I). In the language of the next section, the Jacobi orthogonal 
ensemble thus arises as the distribution of the squared principal cosines 
between two random subspaces. Similar statements hold for complex-valued 
Gaussian data matrices X , Y and the Jacobi unitary ensemble [cf. Collins 
(2005), Theorem 2.2, and Absil, Edelman and Koev (2006)]. 
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2.1.3. Multivariate linear model. In the standard generalization of the 
linear regression model to allow for multivariate responses, it is assumed 
that 

Y = XB + U, 

where Y(n x p) is an observed matrix of p response variables on each of 
n individuals, X(n x q) is a known design matrix, B(q x p) is a matrix of 
unknown regression parameters and U is a matrix of unobserved random 
disturbances. For distribution theory it is assumed that U is a normal data 
matrix, so that the rows are independent Gaussian, each with mean and 
common covariance £. 

Consider a null hypothesis of the form CBM = 0. Here it is assumed that 
C{g x q) has rank g. The rows of C make assertions about the effect of linear 
combinations of the "independent" variables on the regression: the number 
of hypothesis degrees of freedom is g. The matrix M(p x r) is taken to have 
rank r. The columns of M focus attention of particular linear combinations 
of the dependent variables: the "dimension" of the null hypothesis equals r. 
The union-intersection test of this null hypothesis is based on the greatest 
root of 9 of H(H + E)~ l for the independent Wishart matrices H and E 
described, for example, in Mardia, Kent and Bibby (1979), page 162. Under 
the null hypothesis, 9 ~ 9(r,n — q,g). The companion paper Johnstone (2009) 
focuses in greater detail on the application of Theorem 1 in the multivariate 
linear model. 

2.1.4. Equality of covariance matrices. Suppose that independent sam- 
ples from two normal distributions iVp(/ii,£i) and A^/^,^) lead to co- 
variance estimates Ej which are independent and Wishart distributed on rtj 
degrees of freedom: Ai = njEj ~ W p (ni, Ej) for i = 1, 2. Then the largest root 
test of the null hypothesis Hq : Ei = £2 is based on the largest eigenvalue 8 of 
(A\ + A2)~ 1 A2, which under Hq has the «i, n.2) distribution [Muirhead 
(1982), page 332]. 

2.1.5. Multiple discriminant analysis. Suppose that there are g popula- 
tions, the ith population being assumed to follow a p-variate normal distri- 
bution iV p (/Lii,£), with the covariance matrix assumed to be unknown, but 
common to all populations. A sample of size rij is available from the ith 
population, yielding a total n = J2 n i observations. Multiple discriminant 
analysis uses the "within groups" and "between groups" sums of squares 
and products matrices W and B to construct linear discriminant functions 
based on eigenvectors of W _1 B. A test of the null hypothesis that discrim- 
ination is not worthwhile = ■ ■ ■ = fjL g ) can be based, for example, on the 
largest root of W _1 B, which leads to use of the 9(p,n — g, g — 1) distribution 
[Mardia, Kent and Bibby (1979), pages 318 and 138]. 
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Table 2 



Setting 




P 


rn 


n 


CCA 


[X Y] ~iV p+9 (0,/ n ®£) 
#o : Sxy = 


P 


n — q 


<1 


Multivariate 

Linear 
model 


Y =X B + U 

nXp qXp nXp 

H : C P M = 

SX<3 qXpPXr 


;■ 

T 

dimen 


n — q 

T 

error d.f. 


g 
T 

hypoth. d.f. 


Equality 

of covariance 


mtji ~ Wp (ni, Ei) 
-Ho : Ei = E2 


P 


m 


112 


Mult, 
discrim. 


rii obs on p pops N p (fii, E) 
i = l,...,S 


P 


n-g 


9-1 



Table 2 summarizes the correspondences between the parameters in these 
various models and those used in Theorem 1. 

2.2. Discussion and implications. 

Limiting empirical spectrum. The empirical distribution of eigenvalues 
9i of (A + B)~ l B is defined to be 

F P (9)=p~ 1 #{i:6 i <6}. 

Wachter (1980) obtained the limiting distribution of F p in an asymptotic 
regime (3) in which m and n grow proportionally with p. We recall Wachter's 
result, in the new parameterization given by (6). Suppose, for convenience, 
that p<n, and let 



0± = sin 



^±7 
2 



or, more precisely, the limit as p — > 00 under assumption (3). Then for each 
6 G [0, 1], F p {6) — ► Jq f{9') d9', where the limiting density has the form 



/(0) = g(i-g) ' c = 27rsm (7/2). 

This is the analog for two Wishart matrices of the celebrated semicircle law 
for square symmetric matrices, and the Marcenko-Pastur quarter-circle law 
for a single Wishart matrix [for references, see, e.g., Johnstone (2001)]. 

In the canonical correlation setting — p and q variables and n samples — 
the parameters 9± represent the limiting maximum and minimum squared 
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correlation. They are expressed in terms of the half-angles 7/2 and (p/2, 
which for p/n and q/n fairly small are roughly 

7/2 = sfpjn, ip/2 -- 

Smallest eigenvalue. Assume that A ~ W P (I, m) independently of B ~ 
W p (I,n) and that both m,n>p. In this case, all p eigenvalues of A _1 B are 
positive a.s., and we have the identity 

(14) Bx{{A + B^B} = 1 - 9 P {(A + B)- 1 A}, 

where O^iC) denotes the A;th ordered eigenvalue of C, with 9\ being smallest. 

Let 6~(p,m,n) denote a random variable having the distribution of the 
smallest eigenvalue of {A + B)^ 1 B. Clearly 

6~(p, m, n) = 1 — 9(p, n, m) 



and if we set 6' = 6(p, n, m), then 

T 



W P =1 °gq 7^ = - lo g- 



If we therefore set 

fi~ (p,m,n) = —fi(p,n,m), a~ (p,m,n) = a(p,n,m), 

where fi(p,n,m) and a(p,n,m) are given by (5)-(6) (with n and m inter- 
changed), we have convergence to a Tracy-Widom distribution reflected at 



0: 



or, writing F(t) = 1 — Fi(t) for the complementary Tracy-Widom distribu- 
tion function, 

\P{W~ < fi~ - a-t) - Fx{t)\ < Cp- 2 '\- ct . 

This form highlights the phenomenon that convergence for the distribution 
of Q~ (p, m, n) is best in the left tail. As with 6\ (p, m,n), the approximation is 
best in the part of the distribution furthest from the bulk of the eigenvalues. 

Analogy with t. Here is an admittedly loose analogy between the null 
distribution of the largest eigenvalue and that of the i-statistic. Both cases 
assume Gaussian data, but in the t case, the test is on the mean fi, while 
for 61 it concerns the covariance structure. In both settings, the exact null 
distribution is known, but one is interested in the rate of convergence to the 
limiting distribution which is used for approximation. Table 3 compares our 
result — in the canonical correlations version — with a standard fact about 
the Gaussian approximation to the t distribution: if the parent distribution 
is Gaussian, then the convergence is second-order, in that the error term is 
of order 1/n rather than the first-order error 1/y/n associated with central 
limit theorem convergence. 
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Table 3 





t-statistic \/nx/s 


largest root ui of A, B 


Model: 


X, ~ d N(fj,,a 2 ) 


(*«)~iV(0,£) 




H o : fi = 


#o : Sat = 


Exact law: 


t ~ t n -l 


tii ~ JOE p (n — q — p,q — p) 


Approx. law: 




F 1 (x) = exp{-i /" q(s) + (x- s) 2 q(s) ds} 


Convergence: 


O^" 1 ), not O^- 1 / 2 ) 


0{p~ 2/3 ), not O0~ 1/3 ) 



Convergence of quantiles. Let F p \ denote the distribution function of 
(W p — fj, p ) I dp — Theorem 1 asserts the convergence of F p ^(s) to Fi(s) at rate 
p-2/3_ p or gi ven a g (o, 1), let s p (a) = F~l(a) and s(a) = F± l (a) denote 
the ath quantiles of F Ps \ and F±, respectively. Under the assumptions of 

Theorem 1, we have convergence of the quantiles at rate p~ 2 / 3 : this follows 
from 

fmm(a)\s p (a) - s(a)\< |iq(s p (a)) - i<i(s(a))| 

= |Fi(a p (a)) - F Pjl (s p (a))\ < C(a)p^ 3 , 

where / m in(aO denotes the minimum value of fi(s) for values of s between 
s p (a) and s(a). 

Convergence of 0\ p . An informal way of writing the conclusion of The- 
orem 1 is 

W p = fi p + a p Z 1 +0(p- 4 / 3 ), 

as may be seen noting that W p and Z\ can be defined on a common space 
using a U(0, 1) variate U, setting W p = fi p + a p F~l(U) and Z\ = F^ l {U) 
and using the remark of the previous paragraph. 

A straightforward delta-method argument now shows, for smooth func- 
tions g(w), that 

g{Wp) = g(fi p ) + a p g'( f i p )Z 1 + 0(p" 4 / 3 ). 

In particular, with the logistic transformation g(w) = e w /(l + e w ), we get, 
on the original scale, 

#i, P = W + ^i + 0(p- 4/3 ), 

with 

. 2 ( l P + l\ 3 sin 4 ((^ + 7) 

V 2 / 4(m + n — 1) sin sin 7 
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If the p~ 2 / 3 convergence rate is invariant to smooth transformations g, 
what is the special role of the logit transform in Theorem 1? Several related 
observations may be offered. First, empirical data analysis of quantities be- 
tween and 1, such as 6i, is often aided by the logit transform — indeed 
the idea to use it in this setting first emerged from efforts to improve the 
approximation in probability plots such as Figure 1. [Noting that 9i may be 
thought of as squared canonical correlations, one also recalls that Fisher's 
z = tanh -1 r transformation improves the normal approximation in the case 
of a single coefficient.] 

Second, at a technical level, as explained in Section 4, our operator con- 
vergence argument uses an integral representation of the Jacobi correlation 
kernel whose form is most similar to the Airy kernel when expressed using 
the hyperbolic tangent. 

Finally, a geometric perspective: a natural metric on the cone of positive 
definite symmetric matrices is given by d 2 (A,B) = ^log 2 u>j, where {wi} 
are the eigenvalues of A _1 B. Expressed in terms of the eigenvalues 6{ = 
Wi/{l + Wi) of (A + B)~ X B, we get d 2 {A, B) = £ log 2 [^/(l - 0*)], which is 
just Euclidean distance on the logit scale. 

Single Wishart limit. If 6 is an eigenvalue of (A + B) -1 B , then m6/(l — 9) 
is an eigenvalue of mA~ 1 B. Since mA~ l ->/|,asm-> oo, information about 
the largest eigenvalue of a single Wishart matrix is encoded in the double 
Wishart setting. This is spelled out in terms of hyper geometric functions by 
Koev (n.d.). However, as regards asymptotics, the preliminary limit m — > oo 
takes us out of the domain (3), and a separate treatment of the Tracy- 
Widom approximation is needed. Jiang (2008) does this in the Jacobi setting 
for m 2 /n — > oo [and assuming p/n ->c£ (0, oo)]. For the single Wishart 
matrix problem, in the complex case, see El Karoui (2006), and for the 
real setting, Ma (n.d.). The last two references both focus on second-order 
accuracy results. 

On the assumption that p is even. The method of proof of Theorem 1 
relies on a determinantal representation (48), that is, valid in the real case 
only for p (written as N + 1 in the notation there) even. There is no such 
concern in the complex case. 

Numerical investigation, both for this paper (Table 1) and its companion 
[Johnstone (2009)], suggests that the centering and scaling formulas (5) and 
Tracy- Widom approximation work as well for p odd as for the p even cases 
considered in the proofs. However, theoretical support for this observation 
remains incomplete. On the one hand, interlacing results would allow the 
largest eigenvalue for p odd to be bracketed between settings with p ± 1 
[e.g., Chen (1971), Golub and Van Loan (1996), Corollary 8.6.3]. On the 
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other hand, attempts to translate this directly to an 0(p~ 2 ^ 3 ) bound for 
the approximate distribution of W p in Theorem 1 encounter the following 
obstacle. Writing fi p + a p s = fi p +i + a p+ \s' so as to exploit the convergence 
result for p + 1 leads to 

s' - s = ([J,p+i - f-ip)/<T P + (a p+ i/a p - l)s, 

and calculations similar to those for Lemma 4 below show that — 
li p )/a p is generally 0(p~ 1 ^ 3 ). 

3. Jacobi ensembles. We turn to a formulation of our results in the no- 
tation of random matrix theory (RMT), which provides tools and results 
needed for our proofs. A probability distribution on matrices is called an 
ensemble; that ensemble is termed unitary (resp., orthogonal) if the matrix 
elements are complex (real) and it is invariant under the action of the uni- 
tary (orthogonal) group. Because of the invariance, interest focuses on the 
joint density of the eigenvalues. A class of such ensembles of special interest 
in statistics has joint eigenvalue densities of the form 

N 

/jv,/3 = c N ,f3 n \ x 3 - x kf n w p( x j)- 

j<k j=l 

The index (3 = 1 for real (orthogonal) ensembles and (5 = 2 for complex 
(unitary) ones. Here uup is one of the classical weight functions from the 
theory of orthogonal polynomials [Szego (1967)], for which logu; is a rational 
function with denominator degree d < 2. Most studied are the Gaussian 
ensembles (d = 0), with w(x) = e~ x / 2 , leading to Hermite polynomials, and 
corresponding to the eigenvalues of iV x N Hermitian matrices with complex 
or real entries. 

Next (d = 1) is the weight function w(x) = e~ x x a of the Laguerre poly- 
nomials, corresponding to the eigenvalues of Gaussian covariance matrices, 
or equivalently to singular values of iV x (N + a) matrices with independent 
Gaussian real or complex entries. 

Our interest in this paper lies with the final classical case (d = 2), with 
weight w(x) = (1 — x) a (l + x)^ leading to the Jacobi polynomials P^{x). 
While the associated Jacobi unitary and orthogonal ensembles may have 
received relatively less attention in RMT, they may be seen as fundamental 
to the classical null hypothesis problems of multivariate statistical analysis. 

Remark. Second-order convergence results, with centering and scaling 
constants, are developed for the Laguerre case by El Karoui (2006) and Ma 
(n.d.) for the complex and real cases, respectively. A forthcoming manuscript 
will describe A -2 / 3 convergence in the simplest Gaussian ensemble settings. 
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Unitary case. Along with the Gaussian and Laguerre ensembles, the 
eigenvalue density (17) of the Jacobi ensemble has the form cn Y\i w(xi) ■ 
A^y(x) with w{x) = (1 — x) a {\ + x)P being one of the classical weight func- 
tions of the theory of orthogonal polynomials, and 

^n{x) = Y[(xi - Xj ) = detfxj" 1 ] 

i<j 

being the Vandermonde determinant. Let 4>k( x ) be the functions obtained 
by orthonormalizing the sequence x k w 1 ^ 2 (x) in L 2 (-l,l). In fact, 

(15) Mx)=h- k 1/2 w 1 / 2 (x)P^(x), 

where Pu (x) are the Jacobi polynomials, defined as in Szego (1967). By a 
standard manipulation of the squared Vandermonde determinant, the eigen- 
value density has a determinantal representation 

1 

W\ 

with the correlation kernel having a Mercer expansion 

N-l 

(16) S N:2 (x,y) = <f>k(v)<l>k{y)- 

k=0 

The joint density of the eigenvalues is assumed to be 

N 

(17) f N)2 (x) = c~[[(l- Xl ) a (l + XifUixi - x 3 f. 

i=l i<j 

With the identifications 



Jn,2{x) = — det[S Nt2 (xj,x k )] 



(18) 





( p 


3-" 


m — p 




\ n — p 



1 + x 



we recover the joint density of the roots of the double Wishart setting given 
at (10). 

Our asymptotic model, which is equivalent to (3), assumes that a = a(N) 
and 8 = @{N) increase with iV in such a way that 

/ s a(N) , . B(N) 

(19) -y^aooG^tx)), ^_i^ &ooe [ ,oo). 

The dependence on iV will not always be shown explicitly. Introduce param- 
eters: 

a — 8 a + 8 

(20) k n = a + 3 + 2N + 1, cos(^ = -, cos7 = - 
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and centering and scaling constants (on the x-scede): 

\ s 2sin 4 ((^ + 7) 

(21) x N = -cos((^ + 7), ^ = -2-^ : . 

k n sin (,9 sin 7 

It will turn out that a better approximation is obtained working on the 
u-scale defined through the transformation x = tanh u. The centering and 
scaling parameters become 

(22) un = tanh -1 xn, t~n = <tat/(1 — x 2 N ). 

The final centering and scaling parameters are suitable averages of those 
required for approximation at polynomial degree N and N — 1: 

(23) fi= _ 1 _ 1 , a =2{ t n + T N-i)- 

T N + T N-l 

Theorem 3. There exist positive finite constants c and C(sq) so that 
for s>s 

|P{(tanh -1 a; (1) - fi)/a < s} - F 2 (s)\ < CN~ 2 / 3 e~ cs . 

A consequence of our approach is a convergence result for the two-point 
correlation kernel, rescaled by r(s) = tanh(/i + as), to the Airy kernel 

(24) g4( ,, t)= AiMAi-( t )-Ai( t )Ai'M 

s — t 

Indeed, uniformly on half intervals [so,oo), we show (in Section 7) that 

(25) ^T>(s)T>(t)S N , 2 (T(s),T(t)) = S A (s,t) + 0(iV- 2 / 3 e-(^/ 4 ). 

Orthogonal case. Suppose that N + l is even. The joint density of the 
eigenvalues is assumed to be 

N+l N+l 

(26) /(x)= c n(i-^) ( °- i)/2 (i+^) (/3 " i)/2 n 

i=l i<j 

With the identifications 

(27) I a J = I m — p 
\ P I \n-p. 

we recover the joint density of the roots of the real double Wishart setting 
given at (7). We match N+l (rather than N) to p because of a key formula 
relating the Jacobi orthogonal ensemble to the Jacobi unitary ensemble, (50) 
below. 
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The asymptotic model is the same as in the unitary case, that is, (19), as 
are the definitions of (kn,^,^) in (20) and (xn,o~n) in (21). 
The final centering and scaling parameters are given by 



Theorem 4. With fj,,a defined by (28), there exist positive finite con- 
stants c and C so that for s> sl 



Related work. The first asymptotic analyses [e.g., Nagao and Wadati 
(1993), Nagao and Forrester (1995)] of the Jacobi correlation kernel assumed 
a, (3 fixed as N — > oo. As a result, the upper limit of the N eigenvalues 
equalled the upper limit of the base interval [—1,1], a "hard" edge. 

The "double scaling limit" (19), natural for statistical purposes, has also 
arisen recently in RMT. Baik et al. (2006) [see also Deift (2007)] develop 
a probabilistic model leading to JUE for the celebrated observations of 
Krbalek and Seba (2000) that the bus spacing distribution in Cuernavaca, 
Mexico is well modeled by GUE. Baik et al. (2006) consider the double scal- 
ing limit (19) in the bulk. 

Turning to the edge, Collins (2005) has shown that the centered and scaled 
distribution of the largest eigenvalue (in fact eigenvalues) of JUE converge 
to the Tracy-Widom distribution F% under asymptotic model (19) in the 
"ultraspherical" case in which a(N) = (3(N). Our Theorem 3 applies also 
when a(N) ^ (3{N) and provides an 0(iV~ 2 / 3 ) rate bound. Collins uses a 
somewhat different centering and scaling, and with those proves convergence 
of the two-point correlation kernel with error 0(N~ 1 ^ 3+£ ). 

We remark that the universal Airy scaling limit arises in the double scaling 
limit because (19) forces the upper edge to be "soft," converging to Xoo < 1. 

4. Strategy of proof. A kernel A(x,y) defines an operator A on functions 
g as usual via (Ag)(y) = J A(x,y)g(y) dy. For suitable functions /, denote 
by Sf the operator with kernel S(x, y)f(y). Let En denote expectation with 
respect to the density function (17). A key formula for unitary ensembles 
[e.g., Tracy and Widom (1998)], valid in particular for (17), states that 



(28) H = un, ct = tn, 

where (mat, tat) are as i n (22). Thus, after inserting (21) into (22) 




|P{(tanh- 1 x (1) -pi)/a<s}- Fi(a)\ < CN~ 2/3 e^ 



N 



(30) 



En II t 1 + /C*y)] = det (^ + S N ,2f) 
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where the right-hand side is a Fredholm determinant of the operator Sjv,2/ 
[Riesz and Sz.-Nagy (1955), Gohberg and Krein (1969), Chapter 4]. The 
choice / = — X0) where Xo(^) = I(x ,i]( x ), yields the determinantal expression 
for the distribution of x^y. 

(31) F N2 (x ) = p\ max xj < x } = det(I - S N2 Xo)- 

Tracy and Widom (1994) showed that the distribution F 2 has a determi- 
nantal representation 

F 2 (s Q ) = det(I-S A ), 

where S A denotes the Airy operator on L 2 (sq, oo) with kernel (24). We 
introduce a rescaling x = t(s), with xo = t(sq). To derive bounds on the 
convergence of -F/v,2(^o) to F 2 (so), we use a bound due to Seiler and Simon 
(1975): 

(32) | det(I - S T ) - det(/ - S A )\ < \\S T - S A \\i exp(||5 r ||i + \\S A \\i + 1). 
Here, operator S T has kernel 

(33) S T (s,t) = ^r'(s)T'(t)S N (r( S ),T(t)) 

and is a suitably transformed, centered and scaled version of and || • ||i 
denotes trace class norm on operators on L 2 (sq, oo). The role of the nonlinear 
transformation contained within r will be discussed further below. This 
bound reduces the convergence question to study of convergence of the kernel 
S T (x,y) to S A (x,y). For this, we use integral representations of both kernels. 
For the Airy kernel [Tracy and Widom (1994)] 

(34) S A (s,t)= Ai(s + z)Ai(t + z)dz, 

Jo 

while for the Jacobi kernel, we use a formula to be found in Forrester (2004), 
Chapter 4. To state it, define 

7 , , flWtanhu) - , , SWtanh-u, tanhu) 

(35) 4>N{U) = r > S N)2 {U,V)- 



cosh u cosh u cosh v 

Then, from the final display in the proof of Forrester's Proposition 4.11, 

Sn,2{U,V) = / [cpNiu + WjCpN^lv + W) 



(36) 

+ 0at_i(u + w)4>n{v + w)] dw. 

The convergence argument will therefore be based on bounding the con- 
vergence of a suitably transformed, centered and scaled version of the weighted 
Jacobi polynomials 4>n(x) and 4>m-i{x) to the Airy function Ai(s). 



LARGEST EIGENVALUE IN MULTIVARIATE ANALYSIS 



21 



m-'jv(x) n'iv(ianh u) 




Fig. 2. Top ie/t; weighted Jacobi polynomial w N (x) = (1 - x 2 ) 1/2 w 1/2 (x)P^ ,li \x) for 
a = 10, /3 = 5, N = 20. Bottom left: focus on wn{x) in neighborhood of largest zero. Top 
right: nonlinear transformation of abscissa: wjv(tanhw). Bottom right: limiting Airy func- 
tion Ai(s). Note improvement in approximation due to stretching of abscissa by hyperbolic 
tangent. 

The strategy for approximation by Ai(s) is shown in Figure 2. The weighted 
Jacobi polynomial 

(37) w N (x) = (1 - x)( a+1 )/ 2 (l + x)^l 2 P^{x) 

has ./V zeros in (—1,1), shown in the top left panel. Zooming into a neigh- 
borhood of the largest zero (bottom left panel) shows a similarity with the 
graph of the Airy function. The nonlinear transformation x = tanhu of the 
abscissa is suggested by the form of the integral representation (35)-(36); in 

particular of course, it stretches x£ (—1,1) to u£ (—00,00). The top right 

1/2 * 

panel shows u;jv(tanhii) = hj^ 0jv(w): the stretching of the abscissa has im- 
proved the visual approximation to the Airy function, especially in the right 
tail. 

To carry out the Jacobi polynomial asymptotics, several approaches are 
available, including saddle-point methods based on a contour integral repre- 
sentation [e.g., Wong and Zhao (2004)] and Riemann-Hilbert methods [e.g., 
Kuijlaars et al. (2004)]. Our situation is nonstandard because our model 
supposes that the parameters a(N),/3(N) increase proportionally with N. 
We use the Liouville-Green approach set out in Olver (1974), since it comes 
with ready-made bounds for the error of approximation which are of great 
use in this paper. The Liouville-Green approximation relies on the fact that 
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Jacobi polynomials and hence the function wn satisfy a second-order differ- 
ential equation, (71) below, which may be put into the form 

(38) w"(x) = {K 2 f(x)+g(x)}w(x), 



where k = 2N + a + (3 + 1 is the large parameter, and 

f on\ ft \ (x-x N -)(x-x N+ ) 3 + x 

(39) f{x) = — ^ , g(x) ~ 



2 



4(1 -x 2 ) 2 ' ^ ' 4(1 -x 2 ) 2 ' 

The values xn- and x^+, given precisely at (75) below, are approximately 
the locations of the smallest and largest zeros of P N \ respectively. They are 
the turning points of the differential equation; for example, wn(x) passes 
from oscillation to exponentially fast decay as x moves through xn+- 

The Liouville-Green transformation is defined by ignoring g(x) in (38) 
and transforming the independent variable x into £ via the equation £ 1//2 dC = 
f l ' 2 (x)dx, or equivalently 

(40) (2/3)C 3/2 = f f l l\x')dx'. 

J a;_|_ 

Then W = (dQ/dx) l l 2 w is close to satisfying the equation d 2 W /dC? = k 2 (W, 
which is a scaled form of the Airy differential equation, and so has linearly 
independent solutions in terms of Airy functions, traditionally denoted by 
Ai(K 2 / 3 C) and Bi(ft 2 / 3 £). In fact, it turns out that 

^(x) = c^(C(x))^ 1 / 2 Ai( K 2 / 3 C(x)). 

The value of the constant cat is fixed by matching the behavior of both sides 
as x — > 1 (Section A. 4). For an approximation near xn = %N+, we introduce 
a new scaling x = xn + &nsn- To fix the scale on, we linearize ((xn + &nsn) 
about its zero at xn and choose on so that k 2 ^ 3 ((x) = sn- The resulting 
on is of order TV" 2 / 3 . To summarize the results of this local approximation 
and matching, define a particular multiple of wn(x), namely 

(41) ^ N {x) = {l-x 2 ) l l 2 ^ N (x)/^d^. 

Use of the Liouville-Green error bounds (Section 6.1) establishes that for 
s L < s N < CN 1 ^, 

(42) 4> N (x N + s N o N ) = Ai(s N ) + 0(iV" 2 / 3 e-^/ 2 ). 

u-scale. Consistent with the top right panel of Figure 2, we need a trans- 
lation of this approximation to 4>n(u) = ^Ar(tanhu). The u-scale centering 
and scaling (un,t~n) are found by matching tanh(uN + Twi) = xn + o^t to 
first-order, and yield, for to<tN< CiV 1 / 6 

(43) $ N (u N + T N t N ) = Ai(t N ) + 0(iV- 2 / 3 e-^/ 2 ). 
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In an entirely parallel development, there is an analogous result at degree 
N-l, again for t < t N -i < CA 1 / 6 , 

(44) 4> N -x{u N -i + T N - X t N - X ) = Ai(tAr-i) + 0(7V- 2 / 3 e -^-i/ 2 ). 

s-scale. A final calibration of the variables and t^-i is needed to 
match with the variable s in the Airy function scale. Letting and a denote 
the centering and scaling for this calibration, yet to be determined, we set 

(45) (j) T (s) = 4> N (n + as), ip T (s) = 4> N -i(fi + as). 

The change of variables u = fi + as, v = \i + at,w = az in (36) leads to 

(46) S T {s,t) = — \ci) T (s + z)i> T (t + z)+il) T {s + z)<t )T {t + z)}. 

2 Jo 

The coefficient e N = 1 + 0(A _1 ), as is shown at (138)-(140) below. 

The expressions (43), (44) indicate that 0{N~ 2 / 3 ) error is only attainable 
for both (p T and ip T using separate scalings and t^-i - The choice of [i and 
a can be made to transfer that N~ 2 ^ 3 rate to a particular linear combination 
of 4> T and i/j t : in the complex case, the bound 

\Ms) + Ms)\<CN^ 3 e- s / A 

is convenient for achieving an _/V~ 2 / 3 approximation of (46), and indeed, this 
forces the particular choices of \x and a in (144) and hence in Theorem 2. It 
is important for convergence of the integral in (46) that the above bound be 
global — valid on the right half line — and thus extending the "local" results 
of (43) and (44). This argument is set out in detail in Section 7. 

Remarks, (a) The function appearing in the Liouville-Green trans- 
form 

rj _ y / {x-X^)(x-X + ) 

Vf{ } ~ 2(1 -x 2 ) 

is the same as the limiting bulk density of the eigenvalues found by Wachter 
(1980). The same phenomenon occurs in the single Wishart case. 

(b) Our approximations are centered around the turning point of differ- 
ential equation (38), which occurs at s = in its Airy limit. The quantile 
s = occurs beyond the upper quartile of the Tracy- Widom F\ distribu- 
tion, and so it is perhaps not surprising that the numerical quality of the 
Tracy-Widom approximation in Table 1 is better in the right tail of the 
distribution. It is a fortunate coincidence that precisely the right tail is the 
one of primary interest in statistical application. 
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Orthogonal case. A determinant representation for (5 = 1 analogous to 
(30) was developed by Dyson (1970). Tracy and Widom (1998) give a self- 
contained derivation of the formula 



N+l 



(47) ^n[ i +/(^)]=v det ( /+ ^+^)' 

3=1 

with its immediate consequence, for / = xo = — ^(z ,i)> 

(48) p| i< max +i x k < x \ = y/det(I - K N+lX o). 

Here we must assume that N + 1 is even, and then K^+i is a 2 x 2 matrix- 
valued operator whose kernel has the structure 

(49) K N+1 (x,y) = ® 2 \ S N+h i(x,y) - ( £ , x °_ y ^ 

Here di denotes the operator of partial differentiation with respect to the 
second variable, and e± the operator of convolution in the first variable with 
the function e(x) = ^ sgn(x). Thus (eS)(x,y) = J e(x — u)S(u,y) du. Finally 
T denotes transposition of variables TS(x,y) = S(y,x). 

The derivation of Tracy and Widom (1998) does not completely deter- 
mine Sn+i,i(x, y). More explicit expressions, developed by Dyson (1970) 
and Mahoux and Mehta (1991), use families of skew-orthogonal polynomi- 
als. Adler et al. (2000) relate these skew polynomials to the orthogonal poly- 
nomials occurring in the unitary case. A key observation is that the weight 
function of the orthogonal ensemble should be suitably perturbed from that 
of the corresponding unitary ensemble. This leads Adler et al. (2000) to a 
formula that is central for this paper: 

(l-y 2 \ l/2 

(50) S N+lj i(x,y) = I -— J S Nj 2{x,y) + a N K N (j) N (x)(£4> N - 1 )(y). 

Here Sjsr^ is the unitary kernel (16) associated with the Jacobi unitary en- 
semble (17), and 

(51) k N = {2N + a + (3)/2, 



(52) (f> N (x) = (j) N {x)/VT^. 

The orthogonal kernel is thus expressed in terms of the unitary kernel and 
a rank-1 remainder term. The formula allows convergence results from the 
unitary case to be reused, with relatively minor modification. 
As regards the limit, Tracy and Widom (2005) showed that 



F 1 (s ) = Jdet(I-K G OE), 
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where, for the purposes of this paper, the GOE kernel may be written 

/ -fib" 



KaoE{s,t) 

(53) 



-£l 



S(s,t) 



+ 





-e(s-t) 



Ai(s) 
_e(Ai)(t) -e(Ai)(s) Ai(i) 

with 

S(s,t) = S A (s,t) - ±Ai(s)e(Ai)(t). 

Here (if)(s) = / s °°/ and (iiS)(s,t) = S(u,t) du. We use £i in place of 
e in (53) because convergence to Ai(s) is stable in the right-hand side, but 
oscillatory and difficult to handle in the left tail. 

We bound the convergence of i^jv+i.i^o) to Fi(sq) via an analog of (32): 

\F N+1 (s ) - Fi(s )| 

(54) <C(so)C(K t ,K GO e) 

X \ W K r,ii - KGOE,ii\\l + X] - K G OE,ijh >■ 

Here, in analogy with (33), 

K T {s,t) = sJr'{s)r'{t)K N+1 {T{s),T{t)). 

The detailed work of representing K T — Kqo e in terms of the transformation, 
centering and scaling implicit in r is done in Section 8.3. 

As noted by Tracy and Widom (2005), a complication arises in the or- 
thogonal case: as so far described, K T is not a trace class operator, as would 
be required properly to define the (Fredholm) determinant. This obstacle is 
evaded by regarding K T as a matrix Hilbert-Schmidt operator on L 2 (p) © 
L 2 (/9 _1 ) where L 2 (/3 ± ) are weighted Hilbert spaces L 2 ([so, oo), p ± (s) ds). 
We assume at least that p" 1 £ L , and so, in particular, it follows that 
e:L 2 {p)^L 2 (p~ 1 ). Section 8.2 has more detail on this. 

A few further remarks on the origin of the N~ 2 ^ 3 rate in the orthogo- 
nal case. In the unitary case, the N~ 2 ^ 3 rate of convergence for the kernel 
S T (s,t) was obtained by a calculated trade-off of centering and scaling in 
the approximations for (j) T and tjj T , at degrees and N — 1, respectively. In 
the orthogonal case, a cancellation of A" 1 / 3 terms, somewhat fortuitous and 
unexplained, occurs between the integral and rank-1 terms in (50), so that 
such a calculated trade-off is not required. More specifically, we reuse the 
unitary case approximations to <fi T and tp T , but now with the straightforward 
choices fi = u^,a = in (45). In Section 7.3 it is shown that 

(55) \4> T {s) - Ai(a)| < CN~ 2 / 3 e~ s/ \ 

(56) |Vv(s) - Ai(s) - A N Ai'(s)| < CN~ 2 / 3 e- s/ \ 
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where Ajv = (u N - u N -t)/T N -i = 0(iV _1/3 ). Thus, to obtain the A^ 2 / 3 
rate for ip T here, it is necessary to retain the derivative term, itself of order 

JV-V3. 

Focus on the (1, 1) entries of rescaled K^+i and its limit Kqoe- In Section 
8.3, it is shown that (50) may be written 

T'(s)S N+ljl {T(s),r{t)) = e N [S T (s,t) + \4> T {s)e^ T {t)\. 

In contrast with the unitary case, the use of (55) and (56) leads to an A -1 / 3 
term: 

S T (s, t) = S A (s, " 4f Ai(s) Ai(t) + 0(A" 2 / 3 ). 
Turning to the rank-1 term and again using (55) and (56), 

i<M*)Nv)(i) = \ Ai( S )[l - e(Ai)(t)] + ^ Ai( s ) Ai(t) + 0(A" 2 / 3 ), 

and so, remarkably, the A -1 / 3 terms in the previous two displays cancel, 
yielding A~ 2 / 3 convergence, at least for the (1,1) entry. The remainder of 
the convergence argument — including the operator norm bounds to give the 
e - s o/2 dependence — may be found in Section 8.4. 



Relation to other work. There is a large literature on the asymptotic be- 
havior of Jacobi polynomials as A — > oo. For fixed a and /3, classical results 
are given in Szego (1967); for more recent results [see, e.g., Kuijlaars et al. 

(2004) and Wong and Zhao (2004)]. There is a smaller, but growing, litera- 
ture on results when a and (3 depend on iV and tend to infinity with N [see, 
e.g., Chen and Ismail (1991) and Bosbach and Gawronski (1999)]. Closer to 
our approach is Dunster (1999), who uses Liouville-Green transformations 
to study ultraspherical polynomials (a subclass of Jacobi polynomials) with 
a = (3 proportional to N, and provide approximations in terms of Whittaker 
functions. Carteret, Ismail and Richmond (2003) give Airy approximations 
to Jacobi polynomials with one of the parameters proportional to N. Collins 

(2005) , Lemmas 4.12 and 4.14, provides Airy approximations similar to (42), 
but with error term 0(A^ -2 / 3+£ ); his proof uses the differential equation sat- 
isfied by (37), but not the specific Liouville-Green method adopted here. 

5. Jacobi polynomials; preliminaries. We collect here some useful facts 
[Szego (1967), Chapter 4] about the Jacobi polynomials Pn = (x). They 
are orthogonal with respect to the weight function w(x) = (1 — x) a (l + x)@ 
on [—1,1], and have L2 norms: 

(57) h N = Pi(x)w(x)dx = ; , v '-. 
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The leading coefficient Pjy (x) = l n x + • • • is 




) 



and the value at x = 1 is 




The Christoffel-Dar boux formula states that 



N-l 

(60) S N)2 (x,y) = 4>k{x)(j)k{y) =a N 



4> N (x)(j) N -i(y) - 4> N - 1 (x)(j) N (y) 



x-y 




5.1. Parameterizations. We collect and connect several equivalent pa- 
rameter sets which are each useful at certain points. 

(a) Statistics parameters (p,m,n). These describe the parameters of the 
two Wishart distributions described above. We are adopting the notation 
of Mardia, Kent and Bibby (1979), who interpret the parameter p as "di- 
mension," m as "error" degrees of freedom and n as "hypothesis" degrees 
of freedom. 

(b) Jacobi parameters (N,a,f3). These are the parameters appearing in 
the conventional Jacobi polynomials [Szego (1967), Chapter IV]. The con- 
nection to the Wishart matrix parameters is given in the complex case by 



[For the real case, see (18) below.] The conditions m,n>p correspond to 
a,/3>0. 

(c) Liouville-Green form (k, A,/i). To describe compactly the form (87) 
of the Jacobi differential equation below, introduce 



(62) 




(63) 



k = 2N + a + P+l, 
A = a/n > 0, 
/x = /3/k > 0. 
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(d) Trigonometric forms ( r y,(p). 

(66) cos7 = A + /i, cos <p = X — fx. 
From the definitions of (k, A, q), we deduce the ranges 

(67) 0<7<vr/2, 0<^<tt, 7<v?. 
The last four systems are related by the equalities 

faa\ \ , a + b a + P 

(68) cos7 = A + /x 

(69) cos(/? = A — /i 



a + b + 2 a + /3 + 2iV + ' 
a — b a — (3 



a + b + 2 a + (3 + 2N+ 
(e) Half angle forms (<p/2,j/2). 

2 , _ p+l/2 . 2 , /n . n + 1/2 



sin 2 ( 7 /2) = " ' , sin%/2) 



m + ra + 1 ' m + n + l' 

as may be seen by using cos 7 = 1 — 2 sin 2 (7/2) on the left-hand side and 
relations (18) on the right-hand side of (68) and (69). 

5.2. Differential equation for Jacobi polynomials. The weighted Jacobi 
polynomial 

(70) w N (x) = (1 - x)( Q+1 )/ 2 (l + x)^l 2 P^{x) 

satisfies a second-order equation without first-order term [Szego (1967), 
(4.24.1)] 

(71) w"{x) = q{x)w{x) = — _ w(s), 

where the quadratic polynomial 

n(x) = (a 2 - l)(x + l) 2 + {(3 2 -l){x- if 

(72) 

+ [AN{N + a + (3 + 1) + 2(a + l)(/3 + l)](x 2 - 1). 
This equation may be put into a form suitable for asymptotics, namely 

(73) w"(x) = {K 2 f(x)+g(x)}w(x), 
by using the Liouville-Green parameters (63) to set 

. x 2 + 2(A 2 -/x 2 )x + 2A 2 + 2// 2 -l 3 + x 2 

(74) / (x) = i 5^ i= , - 



4(1 -x 2 ) 2 ' yv ; 4(1 -x 2 ) 2 ' 

These choices for /(x), <?(x) and k [taken from Dunster (1999), (4.1)] are 
not unique; however, our goal of obtaining approximations to wn(x) with 
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-1 






\^/(2 U -l) 




1 a'=cosi? 



u « + 1 u=cos 2 (i3/2) 

=-t5 = r J scale 
2 

Fig. 3. Relationship of some key parameters. The turning points x+ and X- are shown 
on both the "Jacobi" scale x £ [—1,1] and the "squared correlation" scale u = r ' 2 £ [0,1], 
along with the respective angle and half-angle interpretations. 



error bounds 0(1/ k) imposes constraints which lead naturally to this choice 
(Section A. 2 has some details). 

The turning points of the differential equation are given by the zeros of 
/, namely 

(75) x± = fx 2 — \ 2 ± y/{l - (A + M ) 2 }{1 — (A — /i) 2 } 

(76) = — cos(v9 ± 7) = cos(-7r — if =F 7), 

where we used (68) and (69). Where necessary to show the dependence on 
N, we write xn±- 

It is easily verified that ?9± = tt — (ip ± 7) € [0, tt]. Indeed (67) entails 
that ■!?_ < 7r, while A > implies via (68) and (69) that cosy? > — cos 7 = 
cos(7r — 7), and hence that ip < tt — 7 and so ??+ > 0. In particular 

(77) - 1 < x- < x + < 1 
and 

(78) x + — x„ = 2 sin sin 7. 

The situation is summarized in Figure 3. 

Furthermore, a little algebra with (75) shows that both 

(79) x + <l 44> A>0 44> m>p, 

(80) X->-l 44> /i>0 44> n>p. 
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For later reference, note that we may rewrite / as 

x — XN- 



(81) f(x) = (x-x N+ )k(x), k(x) - _ ^2)2- 

In particular, by combining (78) and (76), we have 

x n+ — xn- sin ip sin 7 



(82) k N 



4(l-x 2 N+ y 2sinV + 7 ) 



5.3. Asymptotic setting. The asymptotic model used in this paper sup- 
poses that there is a sequence of eigenvalue distribution models, such as 
(17) and (26), indexed by N, the number of variables. The Jacobi parame- 
ters a = a(N),/3 = (3(N) are regarded as functions of N. 

Assumption (A). In the following equivalent forms: 
For (N,a,P): 

, x a(N) , . B(N) 

(83) -^^aooG (0,oo), ^^^6ooG [0,oo). 

For («, A, [/,): 

(84) Atv —> Aqo, hn —> fJ-oo s.t. Aqo > 0, Aqo + ^ < 1 . 
For (p,m,n): 

(85) — >Poo>0, > aoo + l>l. 

m + n p 

We emphasize two consequences of these assumptions. First, that limxjv-f < 
1, so that the right edge is "soft," that is, separated from the upper limit of 
support of the weight function w(x). Indeed, from (75), x+ = 1 if and only 
if A = 0, and since A = a/(2 + a + b), this is prevented because doo S (0, 00) 
and b^ < 00. 

Second, we have limx7v + — xn~ > 0, so that the two turning points are 
asymptotically separated. Indeed from (78), x + > x_ if and only if 7 > 0, 
and from (68) this occurs if and b^ are both finite. 

The constants in our asymptotic bounds (such as Theorems 3 and 4) will 
depend on the limiting values 000,600, or their equivalent forms in (84) or 
(85). These dependencies will not be worked out in detail; instead we will 
use the following somewhat less precise approach. 

Introduce the sets 

D = {(A, n) : A > 0, n > 0, A + n < 1}, 

D s = {(A, n) G D : A > 5, A + fi < 1 - <5}. 

Our results will be valid with C = C{5) for all N such that (Xn,^n) £ Ds- 
Under the asymptotic assumptions (A), some straightforward simplifica- 
tions in formulas occur: 



LARGEST EIGENVALUE IN MULTIVARIATE ANALYSIS 



31 



Lemma 1. The coefficient a at defined at (61) satisfies 

(86) a N = (±smipsm-f)(l + 0(N- 1 )). 

6. Jacobi polynomial asymptotics near largest zero. The goal of this 
section is to establish an Airy function approximation to a version of the 
weighted Jacobi polynomial 4>n(x) in a neighborhood of its largest zero, or 
more correctly, in a shrinking neighborhood about the upper turning point 
a; 7v of (75). More precisely, in terms of the weighted version of 4>n defined at 
(41) and the scaling parameter on defined at (104) below, we develop local 
approximations of the form 

4>n(x n + s N a N ) = Ai(s N ) + 0(AT- 2 / 3 e-W 2 ), 
valid for sn G [sljCN 1 / 6 ] and N > Nq sufficiently large. 

6.1. Liouville-Green approach. We begin with an overview of the Liouville- 
Green approach to be taken, following [Olver (1974), Chapter 11]. The classi- 
cal orthogonal polynomials [such as Laguerre L N {x), Jacobi P N '^ \x)\ satisfy 
a second-order linear ordinary differential equation which, if the polynomials 
are multiplied by a suitable weight function, may be put in the form 

of IV 

(87) -j^j = q(x)w(x) = {k f(x) + g(x)}w(x), xe(a,b), 

where k = k(N) is a parameter, later taken as large. The precise decompo- 
sition of q into K 2 f + g is made in order to obtain 0(1/N) error bounds 
below. A zero of / is called a turning point because, as will be seen in our 
example, it separates an interval in which the solution is of exponential type 
from one in which the solution oscillates. We will assume, for some interval 
(a, b) containing x*, that f{x)/(x — x*) is positive and twice continuously 
differentiable and that g(x) is continuous. 

Define new independent and dependent variables £ and W via the equa- 
tions 




These choices put (87) into the form 

(88) — = {K 2 C + V(C)W 

where the perturbation term ip(() = f~ l ^{d 2 /dC I 2 )(f l / A ) + g/f. Here / is 
defined by 

(89) f(x) = (d(/dx) 2 = f(x)/(. 
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If the perturbation term ip(C) m (88) were absent, the equation d 2 W/d(, 2 = 
k?(W would have linearly independent solutions in terms of Airy functions, 
traditionally denoted by Ai(K 2//3 £) and Bi(K 2 / 3 £). Our interest is in approx- 
imating the recessive solution Ai(K 2//3 £), so write the relevant solution of 
(88) as M^(C) = Ai(K 2 / 3 £) + r/(C)- In terms of the original dependent and 
independent variables ^ and w, the solution W 2 becomes 

(90) w 2 {x, k) = r l l\x){M(K 2 l\) + e 2 (x, «)}. 

Olver [(1974), Theorem 11.3.1] provides an explicit bound for rj{Q) and 
hence e 2 and its derivative in terms of the function V(C) = ^ \i)j{v)v~ 1 I 2 \ dv. 
To describe these error bounds even in the oscillatory region of Ai(x), Olver 
(1974) introduces a positive weight function E(x) > 1 and positive moduli 
functions M(x) < 1 and N(x) such that 

I Ai (as) I <M(x)E~ 1 (x) 

(91) V ^ . W for all x. 
\Ai' (x)\<N(x)E~ 1 (x) 

[Here, E~ 1 (x) denotes 1/E(x).] In addition, 

(92) Ai(x) = -=M(x)E- 1 (x), x>c=-0.37 

v2 

and the asymptotics as x — > 00 are given by 

(93) E(s) ~ v^ 2 / 3 *" 3 ' 3 , M(x) ~ 7r _1 / 2 x -1 / 4 , A^-vr" 1 / 2 * 1 / 4 . 
The key bounds of Olver [(1974), Theorem 11.3.1] then state, for x € (a,b), 



(94) \e 2 (x, k)\ < M(k 2 / 3 C)^" 1 (k 2/3 C) 



exp{^V( C )}-l 



1 



(95) \d x s 2 (x,K)\ <K^ s f 1 / 2 (x)N(K 2 /H)E- l (K 2 / 3 0\e^>\-V(0 

where Ao = 1.04. For k 2 / 3 £ > c, (92) shows that the coefficient in (94) is just 
v / 2Ai(K 2 / 3 C). Here 

V(C)=V(C(x)) = V [x , 1] (H)= f 1 \H'(t)\dt 

J X 

is the total variation on [x, 1] of the error control function 
H(x) = - \v\~ 1/2 ij(v)dv. 



Section A. 3 has more information on V(£)- 

Application to Jacobi polynomials. In the case of Jacobi polynomials P^f (x), 
the points x = ±1 are regular singularities and the points x± defined by (75)- 
(77) are turning points. We are interested in behavior near the upper turning 
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point x+, which is located near the largest zero of P N . We apply the fore- 
going discussion to the interval (a,b) = (xo,l), where xo = \{x+ + x_) = 
{j? — A 2 . In particular, the independent variable C( x ) is given in terms of 
/0) by 

(96) (2/3)C 3/2 = [ X f 1/2 (x')dx'. 

It is easily seen from (74) that £ — > oo as x — > 1. More precisely, it is shown 
at length in Section A. 4, Proposition 4 that 

(2/3)C 3/2 (x) = log(l - xY l + c 0N + o(l), 

where cqn = CQ]y(a,b) is given at (230). 

Bound (94) is valid only if the integral defining V(C) converges as £ — > 
C(6) = oo. That this is true for the specific choices of / and g made in (74) 
follows from arguments in Olver (1974); see Section A. 2 for further details. 

Remark A. For behavior near the lower turning point x_, near the 
smallest zero of P N ,/3 ', we would consider instead the interval (a,a\) = (— 1, 
— cost/?), with a corresponding redefinition of C( x )i an d we would require 
convergence of /_ \ip(v)v -1 / 2 | dv. This would be relevant to approximation 
of the distribution of the smallest canonical correlation, although this can 
be handled simply through (14) in Section 2. 

Bound (94) has a double asymptotic property in x and k which will be 
useful. First, suppose that N and hence k are held fixed. As x — > 1, V(() — > 
and so from (94) and its following remarks E2(x, n) = o(Ai(Ac 2 / 3 £)). Conse- 
quently, as x — > 1 

(97) w 2 (x, K )~f- 1 / 4 (x)M(K 2 / 3 (). 

If the weighted polynomial wn(%) is a recessive solution of (87), then it must 
be proportional to W2- 

(98) W2(x,k) = c N 1 w N (x). 

The important consequence is that cat may now be identified by comparing 
the growth of wn(x) as x — ► 1 with that of W2(x,k). In Section A. 5 it is 
shown that 

(99) c N = e e "/ N 4 6 h]{ 2 , 
where 6" = 0(1). Hence 

(100) w N {x) = e d " /N K]i 6 h]i 2 w 2 (x, K ). 
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The second property is that bound (94) holds for all x € (a, b) and so in 
any interval (ai,b) C (a, b) upon which supV(C) < oo, we have 

(101) \e 2 (x,k)\ = 0(1/k)=0{1/N). 
Comparing the (70) and (15) of wn and 4>n, we obtain 

(l-X 2 ) 1/2 ^N(x) = h- 1/2 W N ( X ). 

Combining (100) with the Airy approximation (90) to W2(x,k), we obtain 

(102) (1 - x^ N (x) = e*'l N 4*t l '\x){M{^0 + e 2 (x, «)}. 

Local scaling parameter on- We are chiefly interested in values x = xn + 
ons near the upper turning point. The scaling constant o~n is chosen so that 
for fixed s, as N — > oo, Ai(/t 2//3 £) — ► Ai(s). Expand C( x ) about the turning 
point xjy: 

(103) k 2 / 3 C(x) = K 2 / 3 ((x N + a N s) = K 2 / 3 a N s( N + \k 2 I 3 o 2 n s 2 Q n + ■■■ . 
Setting the coefficient of s to 1 yields 

(104) a N = (k^Hn)- 1 . 

Consider now the coefficient of Ai(*; 2 / 3 C) in (102): e 9 ' ' l N J- 1/4 (x). 
Observe from (89) that /~ 1//4 (x) = Q{x)~ 1 ^ 2 , and then note that as x — > xjy, 
we have C(x) -1 / 2 — ► C^ 1 ^ = k n 3(T n 2 from (104). Consequently 

(los) 4 /6 r 1/4 (z) = v^w(c(^)/c^)" 1/2 

and so finally we have 

(106) 4> N (x) := (1 - X *)WM?L = e N r N {x)[M(K 2 ' 3 0+e 2 {x^% 
where e N = e e " /N = 1 + C^iV" 1 ) and 

(107) r N {x) := f 1 (x) = — z 

t/kno~n V (jv / 

where the second equality is just (105). 

The goal toward which we are working is a uniform bound on the Airy 
approximation in a local (but growing) region about xjy. 

Proposition 1. For x = x N + sno~n and s L <s N < CA 1 / 6 , we have 

(108) $ N (x) = Ai(s N ) + 0(A~ 2 / 3 e-^/ 2 ), 

(109) o-n4>' n (x) = Ai'(sjv) + 0(A- 2 / 3 e~^/ 2 ). 

Before completing the proof, we still require some further preliminaries. 
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Properties of the LG transformation. Prom (96) it is clear that C( x +) = 0. 
We exploit a decomposition 

(110) C\x) = f(x) = k(x)/£(x), 
where, recalling (81), we have 

(111) k(x) = f(x)/(x-x N ) and £{x) = Q(x) / (x - x N ). 

It follows from Olver (1974), Chapter 11, Lemma 3.1, or directly that £(x) 
is positive and C 2 in (xq, 1), which contains xn. As x — ► xjsr, we have both 
— ► Ctv and k(x) — ► kjy, so that Cn = ^jv/Ctv- Bringing in both (82) and 
(104), we summarize with 

(U n\ sin y sin 7 _; 3 1 

(112) ~ . 4 , ; r - K N - Cat ~ 2 3 • 

2sm 4 (v3 + 7) ^at^at 

Under our asymptotic assumptions (A), we have (n(x) ~^Coo{x), along 
with its first two derivatives, uniformly on compact intervals of (xo, 1). The 
dependence on parameters (A,/i) comes through xn± which converge to 
x OQ ±. From these considerations, we may infer a uniform bound on for 
Gv: 

(113) sup{|Gv(xat + s N a N )\, s N £ [s L , s^ 6 ]} < C. 

Adapting Taylor's expansion (103), and using (104), we find that for some 
s* between and sn, and with x* = xn + s*ctn, 

K 2 / 3 ((x) =S N + \u N S 2 n I{x*)/Cn- 

From (112), it is evident that under assumptions (A), (jv — ► Coo(ooo)^oo) G 
(0,oo). Hence, uniformly for sn S [sl, siN 1 ^ 6 ], we have 

(114) \k 2/3 (~sn\ <Ca N s 2 N . 



Lemma 2. Let r > be fixed. For sn > r , we have kno~n\/jn(x)> t. 

Proof. Exploiting (111) and then the last two inequalities of (112), we 
have 



\J f( x ) = \J{ X ~ x N+ )k(x) > r^J a N k N = r/(K N a N ). □ 
Lemma 3. There exists C = C(sl) such that for sn > s^, 
I Ai(4 /3 C(x))| < E-^nfax)) < Ce-'». 
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Proof. Since | Ai(x)| < M{x)E~ 1 (x) < E~ 1 (x), it suffices to use bounds 
for E . For sn > 1, applying Lemma 2 with r = 1, we have 



-K N ( 1 =K N Vf>KN Vf> — (s N - l)a N = S N - 1. 

For x > 0, we have from (93) that < Cexp(— |x 3 / 2 ), and so 

E-^k^C) < Cexp(-| K7V C 3/2 ) < Ce~ SN . 

2/3 

For sn G it follows from (114) that (,{x)\ < C(sl), and hence 

sup le'^E-^^C)] < c - 



□ 



Proof of Proposition 1. 

Proof of (108). For error bounds, we use the decomposition suggested 
by (106): 

4> N (x) - Ai(s N ) 

= [e N r N (x) - 1] Ai( K 2/3 C) + A1(k 2/3 C) - Ai(sTv) + e N r N (x)e 2 (x, k n ) 

= Eni + E N2 + E N3 . 

For the E^i term, first use (113) to conclude that for sn 6 [sl, siiV 1 / 6 ], 
we have 



(115) \C(x)/Cn ~ 1| 



({u)/( N du 

x N 



< Csno Tv- 



Together with (107), this yields 

(116) \e-NfN{x) - 1| < C(l + sn)ctn- 
This argument also shows that for sjv G [sl, si-ZV 1 / 6 ], 

(117) i<C(^)/CAr<2. 
Combined with Lemma 3, we obtain 

\E N1 \ < Ca N (l + s N )e- SN < CW~ 2/ V Sjv/2 . 

For the E^ 2 term, we first observe, from (114), that for N > Nq we have 
| K 2/3£ _ SAr | < sjy/4, and hence that uniformly for sjv G (sl, si-ZV 1 / 6 ), 

| Ai( K 2 / 3 C) - Ai{s N )\ < Ca N s 2 N sup{| Ai'(t)| : f S7V < f < f sjv} 

< CN-We-*"' 2 , 

where we used (118) below. 
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Finally, for the -Ejv3 term, we use (94) and the uniform bound on V 
(Section A. 3) to get 

\E m \ < CK^ 1 r N (x)M( K 2 / 3 C)E~ 1 (K 2 / 3 C)- 

For sjy £ [sl, 1], we observe that M < 1 and £7 > 1, and use (107) together 
with (117) to conclude that 

I^Tval < Ck~ n 1 < CN~ 2 ' 3 < CN~ 2 l 3 e~ s ^ 2 . 

V Cn J 

For s N £ [l^iiV 1 / 6 ], we note from (93) that 

Since [/(x)] -1 / 4 = [/( 2; )] _1//4 C 1 ' /4 ) we obtain from the first equality of (107) 
and then Lemma 2 that 

r N (x)M(^0 < -7=777^ < H=- 
y/K N a N [/(x)]V4 ^/r 

Consequently, from Proposition 3, we arrive at 

I^tvsI < Ck^E^^Q) < CN~ 2 / 3 e~ SN . □ 

Properties of the Airy function. Ai(s) satisfies the differential equa- 
tion Ai"(s) = s Ai(s). For all s > 0, both Ai(s) > and Ai'(s) < 0, and so, 
from the differential equation, we have | Ai'(s) | is decreasing for s > 0. There 
are exponential decay bounds: given sl, there exist constants Ci{si) such 
that 

(118) |Ai«(s)| <Q( SL )e" s , s>s L , 4 = 0,1,2. 

6.2. Approximations at degree N and N — 1. The asymptotic model used 
in this paper supposes that there is a sequence of eigenvalue distribution 
models, such as (17) and (26), indexed by N, the number of variables. The 
Jacobi parameters a = a(N), (3 = (3(N) are regarded as functions of N. The 
kernel Sn,2(x,u) depends on weighted polynomials = (pj(x;a(N), [3(N)), 
j = 1, . . . , N. The Christoffel-Darboux formula and integral representation 
formulas (36) express Sn,2 in terms of the two functions 

<l> N -i(x;a(N),0(N)) and N (x;a(N), f3(N)). 

To construct approximations in the iVth distribution model, we therefore 
need separate Liouville-Green asymptotic approximations to both 4>n-i and 
4>N- Each of these is defined in turn based on parameters X,fi and functions 
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f(x; A, fi),x + (X, /j.) and C(x; A, fx). In the case of 4>n(x), this uses the param- 
eters (N,a(N),(3(N)), while for 4> N -i(x) we use (N - 1, a(N), P(N)). Thus, 
for example, in comparing the two cases, we have 

k n = K{N;a{N),l3(N)) k N -i = k(N - l;a(N),fi(N)) 

= 2N + a + (3+1, =2N + a + P-l, 

x N = x+(N;a(N),f3(N)), x N - 1 = x+(N -l;a(N),J3(N)), 

( N = C(x N ;N, a(N),P(N)), Gv-i = C(x N -i;N - 1, a(N),P(N)), 

, 2/3; x-l / 2/3 I n-1 

0~N = [K N C^j , CTN-1 = [K N _ 1 C,N-l) 

and so forth. The analog of (42) for 4>n^i(x) = cpj^^i(x;a(N), f3(N)) states 
that in terms of the variable sn-i: 
/ i _ x 2 x 1/2 

(119) 4> N -i(x N -i +a N -is N -i) = Ai(sAr_i) +ejv_i, 

with the error bound valid uniformly for compact intervals of s^v-i- 

Airy approximation to 4>n-i- Define (pN-i correspondingly with every 
occurrence of N replaced with N — 1 . In parallel with the previous approxi- 
mations, now with x = x^-\ + stv-iOat-i and s_l < s^-i < CiV 1 / 6 , we have 

(120) $n-i(x) = Ai(s N -i) + 0(N- 2 / 3 e- SN -^ 2 ), 

(121) a^'^ix) = Ai'(s N ^) + 0{N-*/*e-'»-il 2 ). 

We collect some formulas describing the dependence on N of on,xn and 
(Ttv- For these formulas we regard a and /? as constants not depending on TV; 
in the context of the remark in the previous subsection, we are examining 
differences between <^v and <j>N-i m the iVth eigenvalue distribution model. 

Lemma 4. 

dxy± = ±2 sin% ± 7 ) = 1 - x% ± + o = Q 

aiv ktv sin y sin 7 kno>n 
and, in particular, 

(124) n 7V -u Ar _i = aMjv/^ + 0(Af- 2 ), 

(125) (tn/vn-i,un/vn-i,tn/tn-i = 1 + 0(A^ _1 ), 

(126) 



«AT - MAT_i 



T/V— 1 
1 

TNKNdN 



(l + e N ) = 0(N- 1 / 3 ). 
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The constants implicit in the 0(N~ 1 ^ S ) , 0(N~ 1 ) and 0(N~ 2 ) bounds de- 
pend on the ratios A = a/ k>n and [i = (3/ ' k>n defined at (63). 

Proof. From (63) one obtains 

8kn dX 2a d/i 2/3 

and then from (68) and (69), 

Q27) ^ 2 cos 7 dip 2 cosy? 

d N k sin 7 ' dN k sin <p> ' 

and finally from (76), 

dx N ± f dip dj\ 2 sin((/? ± 7) sin(7 ± <p) 
— ^- = sin 05 ± 'v ) — — ± — — = ■ ■ . 



ON Sm ^ ±7 \dN ^ dN J k sin7sinv5 

To obtain the second inequality in (122), use (86) and (76). Since du/dN = 
(tanh.- 1 )' (x N )dx/dN, (123) follows from (122). 

Writing 67V generically for each of ajy, &n or tn, we obtain (125) by 
writing 

rN 

log(6jv/6jv-i) = / {logbt)'dt, 

JN-l 

and verifying that |(log6t)'| = 0(N~ l ). For example, for lon = (1 — x "n) = 
sin~ 2 (o5 + 7), we have 

d{\ogu N ) 

dN sin( 7 + 95) \dN ' dN , 

from (127). Similarly, writing u' t ,u" for partial derivatives w.r.t. N, one 
verifies that (logu' t )' = u"/u' t = 0(A r_1 ), which shows that un — ujv-i = 
u' N + 0(N~ 2 ), which establishes (124) and allows us to conclude (126) di- 
rectly from (123) and (125). □ 

7. Unitary case: Theorems 2 and 3. 

7.1. Integral representations for kernel. In the unitary setting, with eigen- 
values {x/t} having distribution (17), we have 



P jmaxx/c < xq j = det(I — SnXc 



where Xo( x ) = I(x ,i]( x ) an d the operator S^Xo is defined via 



(SNXo)g(x)=[ S N (x,y)g(y)dy. 

JXQ 



40 



I. M. JOHNSTONE 



Equivalently, we may speak of Sn as an operator on L2[xq,1) with kernel 
S]y(x,y). On this understanding, we drop further explicit reference to xo- 
Consider now the effect of a change of variables x = t{s) with xq = r(so) 
and t: [so,oo) — ► [xo, 1) strictly monotonic. Clearly, with Sk = t~ 1 (xi c ), we 
have 

P{maxxk < xq} = Pjmaxsfc < So}, 
while we claim (see Section A. 6) that 

(128) det(I - S N ) = det{I - S T ), 
where S T is the operator on £2(50,00) with kernel 

(129) S T (s,t) = ] /r'( S y(t)S N (T(s),r(t)). 

The transformations we consider involve both a nonlinear mapping and a 
rescaling: 

r(s) = r\ o T2(s) = tanh(// + as). 

The nonlinear mapping t\{u) =tanhu has already appeared in (35), giving 
rise to the integral representation (36) for the kernel Sn,2 (u, v). The rescaling 
T 2(s) = [i + as is used for the Airy approximation — it yields the rescaled 
kernel 

(130) S T (s,t) = aS/yifJ- + crs, n + at). 

The asymptotic analysis of the edge scaling of the Jacobi kernel has both 
local and global aspects. The first step is to establish a local Airy approxi- 
mation to the weighted Jacobi polynomials appearing in (36). The approx- 
imation is centered around x^ = x(N,a, (3), the upper turning point of the 
differential equation (73) satisfied by y/l — x 2 (f)jq{x) — this turning point lies 
within 0(N~ 2 / 3 ) of the largest zero of <pN- 

The Liouville-Green approximation is made by transforming x to a new 
independent variable £, and as explained at (103)-(104), the scaling a^ = 
a(N,a,f3) is defined by 

a N = (KNCixN))' 1 - 

With centering and scaling (xn,(?n), we establish a local Airy approxima- 
tion, for x = xn + sat cjat: 

(131) <M*) = (1 - x 2 ) 1 ' 2 4^SL = Ai( Sjv ) + OtiV- 2 ^-^^ 

\/KN&N 

uniformly in the variable sn in the range sl< sn < CN 1 ^. A similar local 
approximation is shown to hold for <p^-i(x) with centering xn-i = x(N — 
l,a,(3) and scaling ajy-i = cr(N — l,a,/3). The analog of (131) holds for 
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approximating 4>n~i{x) by Ai(sjv_i) with x = x^-i + sn-i&n~i and sl < 
Sjv-i < CW 1 / 6 and kn replaced by «jv-i- 

To use these local approximations in the integral representation (36), we 
need a version that applies on the u-scale to 4>n(u). Hence, we define 

(132) 4>n(u) = 4>n (tanhu), (f>]y_i(u) = (^jv-i(tanhw). 

On the u-scale, we have u = ujy + T^t with centering um and scaling tn 
defined by 

(133) xm = tanhuTv, tn = ojnctn, un = {1 — x%)~ 1 . 
These definitions are suggested by the approximation 

tanh(ii7v + Tjyt) = tanhttTv + T^t tanh 1 = x^ + a^t, 

where we have used 

tanh'ujv = l/^tanh" 1 )' \xn) = 1 — x 2 N = uj^ 1 . 

With these definitions, and defining un—i, T/v-i and cjjv-i in the correspond- 
ing way, it is straightforward (Section 7.2) to show that for ti, < t < CN 1 ^ , 

4>n{un + r N t) = Ai(t) + 0(iV- 2 / 3 e-*/ 2 ), 

0iV-i(«JV-i + rjv-it) = Ai(t) + 0(iV- 2 / 3 e^*/ 2 ). 

We thus obtain good local Airy approximations for both (f)^ and <j>N—i, 
but with differing centering and scaling values. Our goal is a scaling limit 
with error term for the kernel £V, and so the centering and scaling for Sn 
will need to combine those for <f)^ and <t>N-i in some fashion. In addition, 
the integral representation for Sn involves global features of 4>n and 4>n-i, 
through the transformation x = tanhu. Thus, we use a rescaling u = fi + as 
with the explicit values of (/i, a) given for real and complex cases in Section 
7.3 below, and put 

(134) (j> T (s) = 4> N (fj, + (Ts), ip T (s) = 4> N ^i(n + as). 

We now convert (36) into a representation in terms of 4> T and ip T . First, 
observe from (134) and (132) that 

(j) T (s) = ^jv(tanh(/i + as)), Vv(s) = </>Ar-i(tanh(u + as)). 

From the definition of (f>M in (131), we also have 

x (. , x 4> N (tanhn) 4> N (u) 

(135) (j) N (tanhu)= — = —^^=, 

nj^a n cosn u ^knpn 

with a corresponding identity for 4>n-i- Combining the last two displays 
yields 

4> N {p + as) = ^K N a N (j) T (s), 4>N-i(fJ> + o"s) = yJn N -ia N -x^ T {s). 
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From (130) and (36) and a change of variables u = fi + as, v = fi + at and 

w = az, 

a 2 f°° - 

S T (s,t) = —(k n -l)a N / <j) N {^ + a(s J rz))(j) N _i(n + a(t + z)) H dz 

2 Jo 

(k N - l)a N ^/aNKNPN-lKN-l / 0r(s + z)tp T (t + z) H dz. 

JO 



a 2 



2 

Thus, we arrive at 

(136) S T (s, t) = aSj^ifJ- + as, fi + at) = e/vSV(s, t), 
where 

/•OD 

(137) S T (s,t) = l [<i) T {s + z)^ T {t + z)+il: T {s + z)(t> T {t + z)}dz 

Jo 

and 

(138) c TV := o" 2 (kat - l)aArV CJ A rCr A r -i K A rK A r -i = 1 + 0(^ r_1 )- 

Proof. Using (122) and (126) combined with limsup \xn\ < 1, we find 
that both aN/&N-i and ujn/ojn-i are 1 + 0(iV _1 ), and so 

(139) e N = a%K 2 N -uj 2 N a N {l + 0{N~ 1 )). 
Combining (82) and (133) of w/v with (112), we obtain 

1 1 

(140) -S%N+ ~ x N J) = k N = -o — r . 

From Lemma 1 and (78), j(x+ — rr_) = a/v (l + 0(iV- 1 )), and so (140) shows 
that, indeed, e N = 1 + C^iV" 1 ). □ 

To summarize, we have 

(141) P{(maxu k - n)/o < s } = det(J - S T ). 
The Tracy- Widom distribution 

F 2 (s ) = det(I-S A ), 

where the Airy kernel 

(142) S A (s,t)= / M{s + z)M(t + z)dz. 

Jo 

To bound the convergence rate of (141) to ^(so), we use the Seiler-Simon 
bound (32). To bound S T — S A , we use a simple algebraic identity: 

A[4>tp + ipcj)- 2aa] 

(143) = ((f) + tp + 2a) (0 + $ - 25) + (4> + ip - 2a)(4> + ip + 2a) 

_ 2(^-^(0-^). 
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Inspection of (143) shows that the essential bounds on simultaneous Airy 
approximation of <fi T and ijj T are those given in the following lemma. Set 



(144) 



a 



1 — l^N 1 + T N~l)- 



Lemma 5 (Complex case). There exists C = C(- ■ •) such that for s > sl, 



(145) 
(146) 
(147) 
(148) 
(149) 



\Ms) 
\Ms) 

\Ms)-Ai(s) 
K(a)-Ai(a) 
|^r(s) + ^(s)-2Ai( S ) 



<Ce~ s , 

< Ce~ s , 

< CN-^e- s l\ 

< CN- 2 '\- s l A . 



We remark that the same bounds trivially hold if cj> T (s) and i^ T ( s ) are 
replaced by y / e/v</'r(s) and ^/eN : ^p T (s), respectively. 

Of these bounds, the most critical is (149), which provides the A r_2//3 rate 
of convergence. We wish here also to acknowledge the influence of El Karoui 
(2006), whose methods allowed the formulation and proof of Lemma 5, which 
improved on our earlier, less rigorous approach. 

Inspecting (137) and (142), and setting cfr,ip and a equal to yfe~N(j) T (s + z), 
yfe~Nil) T (s + z) and Ai(s + z), respectively, and 4>,ijj and a to corresponding 
quantities with t in place of s, we are led to an expression for = S T — Sa 
having the form 

K N (s,t)= s Pa i (s + z)b i (t + z)dz. 
Jo i=i 



Lemma 5 leads to bounds of the form 

(150) \ai(s)\ < ame~ as , \k(s)\<bme 

and hence 



s > so 



(151) 



\\Kn\\i< 



-2asQ 



4a 2 



i=l 



To make explicit the role of the rate bounds in Lemma 5, we may write, in 
the case of Kjy = S T — Sa, with a = 1/4, 

\\S t -Sa\\i < C{l-N~ 2/3 + N~ 2/3 -l + N- 1/3 -N- 1/3 )e- so/2 = CN- 2/3 e- so/2 . 

Proof of (151). Recall first that the trace class norm of a rank-1 
operator (f>® ip is just ||0||2||'0||2- Since the trace norm of an integral is at 
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most the integral of the trace norms, 

;>oo 

||^jv||i< / T / \\a i (- + z)h\\b i (-+z)\\ 2 dz. 
Jo i 

From (150), we have ||a;(- + z)\\ 2 2 < a 2 m f™ e ~ 2 < s+z ^ ds = a 2 Ni e- 2a ^+ z ) /(2a), 
and so, after further integration, we obtain (151). □ 

Remark (Summing up: complex case). In conjunction with (141) and 
(32), this leads to the bound in Theorem 3. Theorem 2 is derived from Theo- 
rem 3 in exactly the same manner as Theorem 1 is deduced from Theorem 4, 
as is detailed in Section 8.4.3. In particular, the logit scale quantities wn 
and u>n below Theorem 2 are given by wn = 2un and lun = 2tat. 

7.2. Local bounds — u-scale. 

Proposition 2. For t L <t< CN 1 / 6 , 

4> N {u N + TN t) = Ai(t) + o(iv- 2/ V /2 ), 

(152) 

tn4>' n {u n + r N t) = Ai'(t) + 0(N~ 2 ' 3 e^ 2 ) 

and similarly 

4> N -i(u N -i + T N -it) = Ai(t) + 0(iV- 2 / 3 e-*/ 2 ), 

(153) 

rjv-i&r-iCuJV-i + TN-it) = Ai'(i) + 0(N~ 2 l 3 e~ t l 2 ). 

Proof. Consider first 4>n(u) = <^Ar(tanh(uAr + TNt)). By Taylor expan- 
sion 

tanh(uAr + TNt) = tanhnjv + TNt tanh' + ^T^rt 2 tanh"(u*) 

(154) 

= XAT + (Tiv(t + eAr(t)), 

where we use tanh' un = 1/un, and note that for £l < t < CN 1 /®, 
(155) \e N (t)\ =\(u N /2)T N t 2 tsmh"(u*)\ < CT N t 2 < Ct 2 N~ 2 ^ < CN" 1 / 3 . 
Consequently, from (108), 

4> N {u) = (Pn(x n + cr N (t + e N (t))) 

= Ai(t + e N (t)) + 0(N~ 2 / 3 e- {t+£Nit))/2 ) 

= Ai(t) + 0{N- 2 / 3 e- t/2 ), 
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after we appeal to (155) and also observe using the remarks before (118) 
that 

|Ai(t + ejv(t)-Ai(t))| 

< ejv(t) sup{| Ai'(u)| : t - e N (t) <u<t + e N (t)} 

(156) 

< CN~ 2/3 t 2 Ai'(t - CN~ 1/3 ) 

< 0(iV" 2 / 3 e-'/ 2 ). 

Turn now to 

tn<P'n( u ) = tn (tank' u)(j)' N (tank u) = (uNtanh' u)aNPN( x N + &N(t + eN{t))). 

Noting that wjvtanh'u = ujn t&nh' un + UNTNttanh'' u* = 1 + 0(iiV~ 2 / 3 ), we 
find 

t n 4>' n {u) = [1 + 0(N- 2 / 3 t)][Ai'(t + e N (t)) + 0(Ar- 2 /3 e -(t+^(t))/ 2)] _ 
The argument at (156) applies equally with Ai' in place of Ai, and so 
r N 4>' N (u) = [1 + 0(N- 2 / 3 t)][Ai'(t) + 0(N- 2 ' 3 e- t/2 )} 
= Ai , (t) + 0(A^ 2 / 3 e-*/ 2 ). 
The arguments for (j>N-i an d tn-i<J)' n _ 1 are entirely similar. □ 

7.3. Global bounds. The global bounds that we need for Airy approxi- 
mation to (j) T and Vt are very similar in the real and complex cases. The 
differences in the two statements arise first from the changes in choice of 
centering fx and scaling a, and second because bounds on convergence of 
derivatives are also required for the real case. 

In the complex case, use (144), and in the real case put 

(157) V = u N , o = t n . 
In either case, set 

4> T (t) = 4>n(li + oi) = 07v(tanh(^ + at)), 
i> T (t) = 4>n-i{h + crt) = (pN-i(tanh(fi + at)). 
The results for the complex case were given in Lemma 5. 

Lemma 6 (Real case). There exists C = C(- ■ ■) such that for s>sl, 

(158) \Ms)\, W T {s)\<Ce~ s , 

(159) \Ms% |Vv(s)|<Ce-*, 

\<j> T {s) - M{s)\ <CN~ 2 / 3 e~ s/ \ 
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(160) 

|#.(s)-Ai'(s)| <CN~ 2 / 3 e~ s/ \ 
\ip T (s) - Ai(s) - A N Ai'(s)| < CA^V 5 / 4 , 

(161) 

|<(a) - Ai'(s) - A;v Ai"(s)| < CiV" 2/3 e - s/4 . 

A trivial corollary of Lemma 6 that will be also needed in the real case 
is that right-tail integrals (iip)(s) = J^°^(s)ds satisfy the bounds of type 
(158)-(161) whenever if; does. 

We shall give proofs of both complex and real cases, indicating the parts 
that are in common and that are divergent. Proofs for the bounds involving 
4>' T and i/)' T are deferred to Section A. 8. 



Bounds for c/) T ,ip T . Combine (106) with the bound | Ai(x)| < M(x)E x (x) 
and with (94). Since V(C) is bounded (Section A. 3), we have, for x = tanh(/i+ 
as), 

\Ms)\ < Ce N r N (x)M(K 2 1 i 3 C N )E- 1 (K 2 ] { 3 C N ), 

where Em and rjy(x) are defined near (107). Here and in the argument below, 
analogous bounds hold for ip T with N replaced by N — 1. 
We consider two cases: s 6 [sl, s{\ and s£ (si, oo). 

(i) Large s. First we argue as for the En3 term in the local bound, that 
(162) r;v(x)M(4 /3 0v) < c /^. 

(This argument is valid for all s > s\ and for the N — 1 case.) For the 
exponential term, it is convenient to modify slightly the argument used for 
Lemma 2 and Proposition 3. We suppose that s\ is chosen so that x > 
r(si) > max(xAr + r 2 <jN, x^-i + r 2 <7jv_i). Then we have (using the A-case 
as lead example) 



4m> 



r v / a N (x N+ - x N -) 



2(1 -x 2 ) 

so as to exploit the inverse hyperbolic tangent integral 

" r ( s ) dx 

'r(si) 

to conclude using (140) that 



/•t(s) ^ x 

I = = tanh -1 t(s) — tanh -1 t(s\) = a(s — s\ 

Jt(si) l~X z 



t;K N C^ 2 = k n [ VTiV > 1 -kn\/o-n(xn+ - x N -)a(s-si) =r — - — i 

3 Jx N 2 v CTNCON 



LARGEST EIGENVALUE IN MULTIVARIATE ANALYSIS 47 

Of course, for ijj T , the denominator of the last expression is gn-i^n-x- If 
N > A^o is chosen large enough so that 

3 < a n^n < 4 

4 ~~ a N -iUN-i ~ 3' 

then in both real and complex cases 

a a 3 

CJATCJiv' (JN~lUN-l ~ 4' 

so that if we take, say, r = 4/3, then in all cases 

a 

r [s — si) > s — s±. 

(JN^N 

Since E~ 1 (x) < Cexp(— |x 3//2 ), we have 

(163) E~\4^n) < CeM-hnCT) < Cexp(-(s - Sl )) < Ce~ s . 
Combining (162) and (163), we get, for s > s±, that 

\M S )\ and <Ce~ s . 

(ii) Small s £ [sl,si]. We use the bounds M < 1, E > 1 and use (107) 
together with (113) to conclude that 

\Ms)\<Cr N {x)<C<Ce- s . 

For the Airy approximation bounds, we must paste together the local 
bounds of Proposition 2, derived separately for indices N and N — 1, into 
the single scaling fi + at; and then second, develop adequate bounds for 
t>tiN l > & . 

Proof of (149). We establish this first, as it indicates the reason for 
the choices (144). In order to use (152) and (153), we set 

(164) H + at = UN + TNtN = UN-l + TJV-lijV-l, 

which yields 

<f) T (t) +lp T {t) = 4>n(un + TNtN) +4>N-l(uN-l + TjV-lijV-l ) 

(165) 

= Ai(t N ) + Ai(%_i) + 0{N' 2 l\e- tN l 2 + e- tN -'' 2 )). 
Rewriting (164), we have tj = t + Cj + djt, where 

Cj = t-\u - Uj ), dj = arj 1 - 1, j = N - 1, N. 
Consequently, we have both 

Ai(ijv) = Ai(t) + (c N + d N t) Ai'(i) + i( C7V + d^) 2 Ai "(*iv), 
Ai(tiv-i) = Ai(t) + (cjv_i + djv-it) Ai'(t) + ^(ctv-i + d^v-it) 2 Ai"(^_!). 
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In the approximation for cf) T + iJj T , the terms in Ai'(i) drop out if we choose 
H and a so that cn + cn-i = dj\r + d^-i = 0, and this leads immediately to 
expressions (144). 

With these choices of \i and a, we find from (125) and (126) that 

UN-l-UN ,o/Ar-l/3\ A T N~1-TN n/M -n 

c N = ■ =0(N /0 ), d N = ■ =0(N ), 

TN-l+TN T N _i+T N 

so that 

(166) sup{|c7v + d N t\ :t L <t< hN 1/e } < CN~ 1/S 
and so 

M(t N ) + Ai(tjv-i) = 2 Ai(i) + 0(iV" 2 / 3 (Ai"(^) + Ai"(^-i)))- 
We conclude for j = N — 1, N, that 

(167) tj>t-CN- 1 ^ and | Ai"(i*)| < Ce~ tj2 . 

Combining these bounds with (165) establishes (149) for t £ [t^, t±N 1 ' 6 ]. 

For t > tiN 1 / 6 , crude arguments suffice. Indeed, from (145) [and with a 
similar argument for ip T (t)], 

(168) \4> T {t) ~ Ai(*)| < \Mt)\ + I Ai(t)| < Ce~ l + Ce' 1 

(169) < CN~ 2 / 3 e~ t/4 . 

Bound for (fi T (s) — Ai(s). For t > iiiV 1 / 6 , we may reuse the bounds for 
4> T (and ip T ) at (168). Consider, then, the interval t£ [t^^iN 1 ^]. In the 
real case, since [i = un,o~ = tn, the bound needed is already established at 
(152). In the complex case, combining elements from the argument above, 
we have 

Mt) = Ai(t) + (c N + d N t) Ai'(i*) + 0(iV- 2 / 3 e-^/ 2 ), 
and (147) follows from (166) and (167). The proof of (148) is analogous. 

Real case bound for ip T (s) — Ai(s) — A^r Ai'(s). As with earlier cases, the 
real work lies for s G [sl, si-ZV 1 / 6 ]. From the definitions, ip T (t) = 4>n-i(p^ + 
at) = (j)N-i{uN + TNt). In order to use (153), we write un + TNt = un-i + 
Ttf-it', so that 

Vv(t) = Ai (*') + o(iv- 2/ V' /2 ), 

where, using (126), 

(170) t'-t = A N + (tnt^ - l)t = A N + OitN- 1 ). 
Consequently, 

(171) Ai(t') = Ai(t) + [A N + 0(tN~ 1 )} Ai'(t) + \ [A N + 0(tN' 1 )} 2 Ai"(f) 
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and since (126) shows that A N = O^" 1 / 3 ), we conclude that 

(172) Ai(i') = Ai(f) + A N Ai'(t) + 0{N- 2 / z e- t/2 ). 

Since (170) shows that e"*'/ 2 = e -t/2+0(N-V*) < Ce -t/2 ( for t e t 1 ]V 1 /6) ) we 
obtain the desired bound in (161) for ip T (t) — Ai(t) — Ajy Ai'(i). 

8. Orthogonal case: Theorems 1 and 4. 

8.1. Derivation of (48)-(50). Tracy and Widom (1998) provide a direct 
derivation of Predholm determinant representations for eigenvalue proba- 
bilities that avoids the Introduction of quaternion determinants. We first 
review this, with the aim of then making the connection to the results of 
Adler et al. (2000) (abbreviated as AFNM below), so as to obtain the ex- 
plicit representation (50) of S^+n as a rank-1 modification of a multiple of 
the unitary kernel SV,2- 

Tracy-Widom derivation. Accordingly, we now fix notation, and specify 
precisely our use of Tracy and Widom (1998). In the case of the Jacobi 
orthogonal ensemble (26), the weight function w(x) = (1 — 1 )/ 2 (1 + 

x )(/3-l)/2_ 

Setting f(x) = —I{x > xo} = — Xo( x )i we write the exceedance probability 
in the form used in Tracy and Widom [(1998), Section 9]: 

P< max xu < xn > 
\l<fc<7V+l J 

= e n (i - i{ Xk > x Q }) 

k=l 

= j '"' / II \ x o - x k\\\w{xj)'\\{l + f{xj))dxi- ■ ■ dx N+1 . 

j<k j j 

The argument of Tracy and Widom [(1998), Section 9] establishes that ifjv+i 
satisfies (49) with Sn+i expressed in the form 

N 

(173) S N+ i A (x,y) = - ^jWVjki^kXy)- 

j,k=0 

Here ipj{x) = pj(x)w(x) and {pj(x), j = 0, . . . , N} is an arbitrary sequence 
of polynomials of exact degree j. The coefficients {^jk} are the entries of 
M _1 , where 



Mjk= / / e(x-y)if)j(x)ip k (y)dxdy. 
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The function e(x) = ^sgn(rr), and, as usual, 

(#fe)(y) = / ' z(y ~ z)ip k (z)dz. 
The next step is to make a specific choice of polynomials pj and hence tjjj. 

Connecting to AFNM. The key to AFNM's summation formula link- 
ing orthogonal and unitary ensembles is the observation that if the unitary 
ensemble has weight function 

W2 ( x ) = e ~ 2V ^ = (1 - x) a {l + xf, 

then the corresponding orthogonal ensemble should have a modified weight 
function 

wtix) = e-^*) = (1 - x)^-^ 2 ^ + x)^' 2 . 

The derivation of AFNM exploits a sequence of polynomials {^(x)} that 
are skew-orthogonal with respect to the skew inner product: 



(f,g)i = J J e(y-x)f(x)g(y)w 1 (x)w 1 {y)dxdy. 
Skew-orthogonality means that 

(92j,92fc+l)l = — (?2fc+l)92j')l = rjd~jk, 

(faj,<L2k)i = (g2j+i,g2fc+i)i = o. 

Given such a skew-orthogonal sequence, we may fix the functions i/jj ap- 



pearing in (173) by setting p k = q k /^r [k / 2 ]. Since M jk = -(qj,qk)i/r\j/ 2 ], it 

follows that M _1 is a direct sum of L = (N + l)/2 copies of (_° ) , so that 
(49) takes the form 

L-l 

(174) S N+1:1 (x,y) = J2[-^2k{x)eip 2 k+i{y) + ^2k+i(x)eip 2 k(y)]- 

k=0 

With the following notational dictionary: 



AFNM Sec. 2 


TW 


e -V(x) 


w(x) 


Qk(x)/y/r[k/2] 


Pk(x) 


Qke~ v /^r [k/2 ] 


ipk(x) 


^k(x)/^f [k/2] 


eijj k (x) 
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we may relate Sjv+1,1 to the function S% given in Adler et al. [(2000), (2.9)] 
by 

(175) S N+ i tl (x,y) = Si(y,x). 

Actually S\ is defined in AFNM, Section 4 by modifying their formula (2.9) 
to replace V and qk by V and q^. This modification has already been incor- 
porated in our discussion. 

Rewriting the AFNM summation formula. In this subsection only, for 
consistency with the notation of AFNM, let Pj(x) be the monic orthogonal 
polynomial associated with e~ 2V ^ =W2(x), and define 7jv_i via 

7 jv_i ||pAr-i||! = k n = \{2N + a + /3). 

Then, in the Jacobi case, noting that e ~^ v ^~^^ = y/l-x 2 , AFNM's 
Proposition 4.2 gives the formula, for N + 1 even, 

1 1 - x 2 

Si(x,y) = J — 2 S N)2 {x,y) 

(176) V 

+ 7N-ie- v W PN (y) J £ (x-t)e- v ^ PN ^(t)dt. 

Using the Jacobi polynomial notation of Section 3, we have pn = Pn/In- 
From the definitions, we have \\pn\\ =Vh~N/lN and = \\pn\\/\\pn-i\\- So 
we may write, using (15) and then (52), 

-v(v) i \ w 2 (y) p n{v) <t>N{y) v^v ,, ,,7 / >, 

e y Wp N (y)= — — — = r — = - — = \\pN\\(t>N{y) 

and 

f e(x-t)e- v ^ PN . 1 (t)dt=\\p N . 1 \\(e^ N ^ 1 )(x). 

The second term in (176) becomes 

aN k n 4>n (y ) (e&N-i ) (x) ■ 

Interchanging the roles of x and y as directed by (175), we obtain (50). [We 
remark that the possibility of expressing the orthogonal kernel in terms of 
the unitary kernel plus a finite rank term was shown already by Widom 
(1999).] 
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8.2. Transformation and scaling. As in the unitary case, to describe the 
scaling limit it is convenient to use a nonlinear mapping and rescaling r(s) = 
tanh(^ + as) with [i = fi(N) and a = o~(N) being modified from the unitary 
setting and to be specified as in (157). 

For the matrix kernel appearing in (48) we have, in parallel with (128) 
and its proof, 

det(J - K N+lX0 ) = det(J - K T ), 
where K T is an operator with matrix kernel 



(177) K T (s,t) = ^/r'(s)T'(t)K N+1 (T(s),T(t)). 

Introducing again s^ = t (xf.), our aim is to study the convergence of 

Fn+i{sq) = P\ m a x Sfc<sof=-Pl max x k < x \ 
[l<k<N+l J [l<fc<Af+l J 



(178) 

= yJdet(I-K T ) 

to 



F l (s ) = ^det(I-K GOE ), 
where, following [Tracy and Widom (2005)] 



(179) K GOE (s,t) 



S(s,t) SD(s,t) 
IS(s,t)-e(s-t) S(t,s) 

and the entries of Kqoe are given by 



S(s, t) = S A (s, t) + \ Ai(a) (l - J™ Ai(«) du 
SD(s,t) = -d t S A (s,t) - \ Ai(a)Ai(t), 



(180) 

IS(s,t) = — I SA{u,t)du 



(ps poo poo \ 

J Ai(u)du + J A\(u)du J Ai(u)du), 



where S A is the Airy kernel defined at (142). 

Tracy and Widom (2005) describe with some care the nature of the oper- 
ator convergence of K^ + iXo to Kqoe for the Gaussian finite N ensemble. 
We adapt and extend their approach to the Jacobi finite N ensemble focus- 
ing on the associated N~ 2 ^ s rate of convergence. We therefore repeat, for 
reader convenience, their remarks on weighted Hilbert spaces and regular- 
ized 2-determinants in the current setting. 
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Let p be any weight function for which 

roo i _|_ g 2 

(181) / — —— ds < oo and 

Joo p{s) 

(182) p(s) < C(l + |s| r ) for some positive integer r. 

As in Tracy and Widom (2005), we consider 2x2 Hilbert-Schmidt operator 
matrices T with trace class diagonal entries. Write L 2 (p) and L 2 (p~ 1 ) for 
the spaces L 2 ((sq, oo), p(s) ds) and L 2 ((sq, oo), p~ 1 (s) ds), respectively. We 
regard K T as a 2 x 2 matrix Hilbert-Schmidt operator on L 2 (p) © L 2 (p~ 1 ) 
and note that e:L 2 (p) — ► L 2 (p~ l ) as a consequence of the assumption that 

p-'eL 1 . 

More specifically, if K T = (k\ ^11) ' we re § ar d -f^n and K22 as trace class 
operators on L 2 (p) and L 2 (/? _1 ), respectively, and the off-diagonal elements 
as Hilbert-Schmidt operators 

K 12 :L 2 (p' l )^L 2 {p) and K 2l : L 2 (p) ^ L 2 (p~ l ) . 

Thus trT denotes the sum of the traces of the diagonal elements of T. The 
regularized 2-determinant of a Hilbert-Schmidt operator T with eigenvalues 
p k is defined by det 2 (I -T) = 11(1 - Vk)^ [cf. Gohberg and Krein (1969), 
Section IV. 2]. Using this, one extends the operator definition of determinant 
to Hilbert-Schmidt operator matrices T by setting 

(183) det(J-T) = det 2 (/-T) e - trT . 

As remarked in Tracy and Widom (2005), the resulting notion of det(J— K T ) 
is independent of the choice of p, and allows the derivation of Tracy and Widom 
(1998) that yields (48)-(50). 

To analyze the convergence of pn+i = Fn+i(so) to = F\(so), we note 
that 



\PN+1 "Pool < \P N+ l "Pool/ Poo = C(s )\ Pn+1 - Pod \, 

so that we are led to the difference of determinants 

(184) \F N+1 (s ) - F(s )\ < C(s )\ det(7 - K T ) - det(J - K GOE )Y 

Our basic tool will be a Lipschitz bound on the matrix operator determi- 
nant for operators in the class A of 2 x 2 Hilbert-Schmidt operator matrices 
A = (Aij,i,j = 1,2) on L 2 (p) © L 2 {p~ l ) whose diagonal entries are trace 
class. 

Proposition 3. For A, B e A, we have 
I det(/ -A)- det(J - B)\ < C{A,B){^ \\ A a ~ B uh + E H^i " 4' 

U=l i+j J 
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The coefficient has the form C(A,B) = J2]=i c ij(tr A,tr B)c 2 j(A, B) , where 
c\j and C2j are continuous functions, the latter with respect to the strong 
(Hilbert- Schmidt norm) topology. 

Proof. From determinant definition (183), we have 

det(J -A)- det(J - B) = [det 2 (I - A) - det 2 (J - B)]e~ trA 

(185) 

+ det 2 (/ - B)e~ tr B [e tr B ~ tr A - 1]. 

A Lipschitz bound for the 2-determinant [Gohberg, Goldberg and Krupnik 
(2000), page 196 or Simon (1977), Theorem 6.5] gives 

(186) |det 2 (/- A) - det 2 (I - B)\ < \\A - B\\ 2 exp{±(l + \\A\\ 2 + \\B\\ 2 ) 2 }. 
The first term of (185) is thus bounded by Ci(A, B)\\A — B\\ 2 , where 

d{A,B) = e" tryl exp{i(l + ||A|| 2 + \\B\\ 2 ) 2 } 

has the requisite form. Since ||^4|| 2 < \\Aij\\ 2 and ||^4m|| 2 < the 
first term satisfies the stated bound. 
For the second term, we have 

\trA-trB\<J2\^(Aii-B ii )\<Y,\\Aii-B ii \\ 1 , 

i i 

where we have used the fact that tr^4 = tr^4n +tr A 22 . Thus the second term 
has the required form C 2 (A,B) = ci 2 (tr A,tr B)c 22 (A, B) with c 22 (A,B) = 
det 2 (7 — B) and ci(x,y) = (e~ x — e~ v )/{x — y). Bound (186) shows that c 22 
has the necessary continuity. □ 

8.3. Representation. The next step is to establish a representation for 
K T (s,t) that facilitates the convergence argument. Our starting point is 
(49), which with the matrix definitions 

\ei T J' y ,yj \-e(x-y) 

may be written in the form 

K N+lj x(x,y) = (LS N+1>1 )(x,y) + K e (x,y). 

In the unitary case, S]y, 2 (x,y) transformed to S T (s,t) = e]yS T (s,t). In the 
orthogonal setting, we show that Sn+i,i(x, y) transforms according to 

(187) r'{s)S N+1 (r{s),T{t)) = e N S?(s,t), 
where 

(188) S?(s,t) = S T (s,t) + lM*)W>T)(t). 
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To establish this, we begin by using the relation 
(189) t (s) =acosh~ 2 (^ + as) = a[l - t 2 (s)], 

in combination with the definition ^ T (s) = 4>n(t(s)) and (135) to obtain 
Ms) = (k^)" 1 /2(i _ t 2 ( S )) 1/2 <Mt(s)) 

(190) 



= (aK N a N ) 1/2 \Jt ! (s)<I) N (t{s)). 

Using (189), we may rewrite (50) in the form 

r'(s)S N+1 (r(s),T(t)) 

= v /r / (s)r / (t)% 2 (r(s),r(t))+a 7 v^r , (s)^(-r(s))(e^-i)(r(t)). 

The first term on the right-hand side equals e]\[S(s,t), as may be seen from 
(129) and (136) in the unitary case. Turning to the components of the second 
right-hand side term, we use (189) and (190) to write 

t'(s)^> n (t(s)) = y/a^T'(s)(j) N (T(s)) = a y 'k n o n<Pt{s) ■ 
From the analog of (190) for tp T and x = r(t), we have 

"1 _ roo 

4> N -i(y)dy= / cj> N -i(T(u))T'(u)du 



(191) 

= <JyJ(JN-lKN-l I 1p T (u)du. 



Consequently 

(e4>N-i)(r(t)) = <t^/(7jv-i«jv-i(s^t)(*)- 

Comparing (51) and (138), we find that /2 = aN^N^ 2 ^J^n^n-i^n^n-i- 
Gathering all this together, we obtain the result promised at (187)-(188): 

T'{s)S N+l {T{s),T{t)) = e N S{s,t) + {e N /2)(j) T {s){e^ T ){t) = e N S?(s,t). 

We turn now to the elements of LSn+i i(T(s),r(t)). Temporarily write 
S(s,t) = r / (s)5AT + i ) i(r(s),r(t)). Observing that e(r(s) — r(t)) = e(s — t), we 
have 



(ei5jv+i)(r(a),r(t)) = / e(r{s) -r(u))S N+1 (r(u),T{t))r'(u)du 

e(s — u)S(u, t) du = s±S. 
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Using such change of variables formulae at each matrix entry as needed, we 
obtain 

-1 

I -d 2 ~ 



SN+i,i(r(s),T{t)) 



fJ-I - 

t'(s) r'(s)r'(t) 

£1 -Lt 



S(s,t). 



(*) / 



Note that the operators act with respect to variables (x,y) on the left-hand 
side, and with respect to variables (s, t) on the right. In terms of L, we write 
this as 



l/r'(s) 



1 



(192) (LS N+1>1 )(T(s),T(t)) = ^'' Q ^ lj(LS)(s,t)^ Q 

We now make use of unimodular matrices U(r) = (q x/r)" ^ e nave > f° r 
example, U{a)K £ U{b) = ( ° )e J). Setting a = l/V^s) and b = y/Tif), 
and combining with (192), we obtain 



K T (s, t) = y/r'(s)T'(t)(LS N+ltl + K e )(t(s), r(t)) 

= C/- 1 / 2 (r'( S ))(L5 + K e )( S ,t)C/ 1 / 2 (r'(t)). 

We now remark that the eigenvalues of UKU~ l are the same as those of 
K, and so det(J - UKU~ l ) = det(J - K). Introduce 



(193) 



Qn(s) 



It'(sq) cosh(^ + as) 



t'(s) cosh(^ + aso) ' 



and abbreviate U(qN(s)) by U qN (s). It follows that in place of K T we may 



use 



K T (s,t) = U l l 2 (T'{s Q ))K T {s,t)U- l l 2 (T'{s Q )) = U qN {s)(LS + K e ){s,t)U-Ht). 



Recalling (187), we may summarize by saying that Fn + i(sq) = ydet(J — K T ), 
with 

K T (s,t) = U QN (s)(e N LS? + K e ){s,t)U^{t). 
Remark. For later use, we define 



(194) /3jv_i = ^ / Vv = [ay/a N -iK N -i 



-U 



where the second equality uses (191). Since, here, N + 1 is even, Lemma 
9 both gives an evaluation of (3n-i and also shows that j\<f>N = 0. The 
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analog of the second equality above for cp T then shows that /f^ (f> T = 0. 
Summarizing these remarks, we have 

/oo roo 
(f> T and {pl> T ){t)=p N -x- J ip T 

with 

(196) /3^ 2 _ 1 =o 2 <j N -iK 2 N _ l a N _i{l+e N -i). 

Remark. The need for care in the left tail is dramatized, for example, 



by 



/*oo /"OO re 

= lim lim / <p T ^ lim lim / <fi T = 

N^oo t— »— oo J i t-i-ooff-ioo J[ J — 



Ai = l. 



Formula (194) and limit (220) show that there is a similar problem for tp T . 

For the convergence argument, due to the oscillatory behavior of the Airy 
function in the left tail, it is helpful to rewrite expressions involving e in 
terms of the right-tail integration operator 



(efiOO) = / g{u)du. 

J s 

Thus eg = \ /f^ g — eg, and we may rewrite (195) as 
(197) e<j) T = -e(j) T , e-i/jr = (3 N -i - eipT- 

For kernels A(s,t), we have 

/oo 
A(u,t)du- (e 1 A)(s,t), 
-oo 

where, of course, 

/•OO 

(eiA)(s,t) = J A(u,t)du. 
Using integral representation (137) for S T along with the (194) and f (fi T = 0, 

/OO _ /"OO 
S T (u,t)du = Pn-x I (f>r(t + z)dz = (3 N -i(i(j> T )(t). 
-oo JO 

As a result, we have 

EiS T = |/?JV-1 ® £(^t —iiS T . 

From (197), we have 

Sy = S T - \<j) T ® e^ T + \4> T P N -x, 
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and, combining the last two displays, 

eiSy = -ii(S T - ~(/> T ® e^ r ) + i/3jv-i(l <8>#r -e^r ® 1). 
Defining operator matrices 



' I 


-cY 


^1 = 


' / 0" 




"0 


0" 




T 


-e 


e 


J 



we arrive at an expression for LS^ that involves only right-tail integrations: 
LS« = l(s T - \<j> T <g> £~Vr) + + ^L 2 (f> T (t). 

Let us rewrite the limiting kernel Kqoe in corresponding terms. Until 
the end of this Section 8, we will write A(s) for the Airy function Ai(s) to 
ease notation. For example, 

IS(s,t) = -ei(S A (s,t) - |A(s)(eA)(t)) - \eA{s) + \iA{t) 

and, assembling the other matrix entries correspondingly, we find 

K GO E = L(S A - \A <g> eA) + |Li A(s) + \L 2 A{t) + K £ . 

Summary. In summary, we may represent 

K T = U qN (s)[e N (K? + K? A + < 2 ) + 
i^GOS = K R + + K% + K £ , 
where we have 

K? = L[S T - \4> T ® e0 T ], K R = L[S A - \A ® eA], 

= ±/? 7V -i£i[<Ms)], K[ = ±LiL4(s)], 

K T F 2 = i^iLa^Ct)], Af = |L 2 L4(t)]. 

Our goal is to use inequalities (160)-(161) to obtain an iV~ 2 / 3 rate of 
convergence. Note in particular that ip T = A + An A' + 0(A~ 2 / 3 ), and so 
define .Ajv( s ) = A(s) + AnA'(s). Expression (137) may be rewritten as 

2S T = 4> T o ip T + ip T o 4> T , 

where the convolution like operator o is defined in the obvious way. Replace 
4> T by A and tp T by A^r to define 

S AiV = ^(AoAtv + A^oA) 

= AoA + \A N {AoA' + A'oA). 
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We have 



(AoA' + A'oA)(s,t)= / — [A(s + z)A{t + z)]dz = -A{s)A(t) 



f°° d 



Jo 



so that 



S A 



N 



S A 



\/S. N A <g) A. 



Since ip T = A + An A' + 0(N 2 / 3 ), we will see below that it is convenient 
to set An = A + An A' and to write the difference 



To organize the convergence argument, then, we describe the components 
of K T - K GOE as 



8.4. Convergence. 

8.4.1. Operator bounds. As a preliminary, we need some bounds on Hilbert- 
Schmidt and trace norms for repeated use. First, a remark taken verbatim 
from Tracy and Widom (2005): the norm of a rank-1 kernel u(x)v(y), when 
regarded as an operator u®v taking a space L 2 (pi) to a space L 2 (p2) is 
given by 

(199) ||«®t;|| = IMkpalM^p- 1 - 

(Here norm can be trace, Hilbert-Schmidt or operator norm, since all agree 
for a rank-1 operator.) Indeed the operator takes a function h £ L 2 {p\) to 
u(v, h), and so its norm is the L 2 (p2) norm of u times the norm of v in the 
space dual to L 2 (pi), which is L^pf 1 ). 

Second, an operator T : L 2 (M,p) — > L 2 (M,p') defined by 



K R -K R = l[S T -S A + \ 



\A N A ®A} — ±L[0 r (g> eVv - A ® iA N ] 



(198) K T - K GOE = S R ' D + 5 RI + 5^ + 5f + 6% + <f 

where, in addition to 5 RI and Sq previously defined, 
5 RD = e N U qN {s)K R U^{t) - K R , 



5 RI = L[S T - S A + \A N A ® A], 
5q = -\L[4> T ® £ij) T -A®eA N ], 



S[ = e N U qN (s)K^U- N \t) -Kf, i = 1,2 
5 e = e N U qN (s)K £ U- N \t)-K £ . 
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has Hilbert-Schmidt norm given by 

(200) \\Tf HS = [ I ' K 2 (s,t)dp'(s)dp(t). 



See, for example, Aubin (1979), Chapter 12.1, Proposition 1. 
We use the following notation for a Laplace-type transform: 

roo 

C(p)[t] = / e- tz p{z)dz. 

J So 

Lemma 7. Let D be an operator taking L 2 {p2) to L 2 {p\) have kernel 
D(s,t) = a(s)/3(t)(aob)(s,t), 
where we assume, for s > sq, that 

(201) \a(s)\ <a e Qls , \(3(s)\ < (3 e^ s , 

(202) \a{s)\ < a e- ais , \b(s)\ < b e~ blS . 

Assume that C{p\) and C(p2) both converge for t > 0, and that a\ > 
ai, b\ > Pi. Then 

\\D\\ HS < ^^{C( Pl )[2( ai - a 1 )]£(p 2 )[2(6 1 - Pi)}} 1/2 - 

If Pi = Pii then the trace norm ||.D||i satisfies the same bound. 

Proof. Substituting the bounds for a and b, one finds 

\(aob)(s,t)\<^-e-^-^. 
ai + bi 

The Hilbert-Schmidt bound is a direct consequence of this, (200) and (201): 
\\D\\hs= [ [ D 2 (s,t)p 1 (s)p2(t)dsdt 



V a\+bi J Jsq Js 

For the trace norm bound, we note that D is an integral over z of rank-1 
kernels, and the norm of an integral is at most the integral of the norms. 
Thus, inserting (199), 



oo 



\D\\ < / \\a(s)a{s + z)P(t)b{t + z)\\ 1 dz 
Jo 

roo 

< / ||a(.)a(- + z)||2 l p 1 ||/3(-)6(- + «)ll2 lPr ^- 
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Now insert the bounds assumed at (201) and (202), so that, for example, 

IKH + < ay e- 2a ^C( Pl )[2( ai - a,)}. 

The claimed bound for \\D\\ now follows after integration over z. □ 

We now make a particular choice of p in order to facilitate the operator 
convergence arguments. For a 7 > to be specified later, let 

(203) p(s) = l + e^. 

For notational convenience, we will let p + and p~ be alternate symbols for 
p and p -1 , respectively. With this choice of p, then for 7 < r 



(204) ^(p ± )[r] < 2 / e - T2±7W cfe < 



_1 g-TSO±7|sol 



so T ~7 

Indeed, if so > 0, our bound is immediate. If so < 0, then split the integral 
into 

2 [° e- TZ ^ z dz + — ^— < -^ e -^ T± ^ sa . 



'so t=F7 r-7 

We shall also use a related bound, proved similarly. For |r| < 7, 

/oo 9 
e r2 - 7|2| dz< e -(7-r)»o+ j 

where «o+ = max{so, 0}. 

Corollary 1. Under the assumptions of Lemma 7, and if p\ and p2 
therein are selected from {p,/}" 1 }, and if 7 < 2(ai — ai),2(&i — then 

\\D\\hs, Pill < c aoftfl ° io c- ( '" +il ' ai - ftls ° +7|s ° l , 
ai + 61 

where C = C(a±, ct\ , 61, /3i , 7) . 

Consequence. We will make repeated use of Lemma 7 and Corollary 
1 in the following way. If any one of ao;/3o> a o ° r °o is 

0(iV- 2 / 3 ) while 

the others are uniformly bounded in N, and the bounds (201) and (202) 
apply, then the Hilbert-Schmidt (resp., trace) norms \\D\\ are 0(N~ 2 / 3 ). 
The convergence conditions for C(p2) and C{p^ 1 ) follow from (181)-(182). 
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8.4.2. Convergence details. 

PROOF of Theorem 4. Insert the conclusion of Proposition 3 into 
(184) to obtain 

\F N+1 (s ) -Fi(so) I 

(206) < C(s )C(K t ,K GO e)^2 W K r,a ~ KgoeMU 

+ \\Kr,ij — KGOE,ij\\2 y 

We exploit decomposition (198): the convergence of the matrix entries 
of K T — Kqoe is reduced to establishing the entry wise convergence of (i) 
terms involving integral kernels, 8 R,D and 8 R,J , (ii) finite rank terms 8q,8i 
and $2 , and (iii) a term 8 s involving versions of the convolution operator e. 
We establish both Hilbert-Schmidt and trace norm bounds for the diagonal 
elements and Hilbert-Schmidt bounds for the off-diagonal entries. The dis- 
tinction is moot for the finite rank terms 8f , and 8 £ involves only the (2, 1) 
entry, so the trace bounds are actually also needed only for the 8 R term. 

For each term, we show \\8ij\\ < iV~ 2 / 3 , so that the term {•} in (206) is 
< CN~ 2 / 3 . We have both \\K T - K GO e\\2 and tvK r ~ ^K GO e 

converging 

to at iV" 2 / 3 rate, so that C(K T , K GO e) 

remains bounded as N — > oo. 

8 R terms. For both 8 R,D and 5 R)I , we use Corollary 1 to establish the 
needed Hilbert-Schmidt and trace norm bounds for each entry in the 2x2 
matrices comprising 8 R ' D and 8 R,J . 

§R,i i erm ^ have 8 R)I = L[S T — Sa n ], and 

S T - S An = (<f) T -A)o^ T + Ao{^ T - A N ) + (V> r - A N ) o<l) T + A N o {<j> T - A). 

In turn, for d2(S T — Sa n ) we replace the second slot arguments ip T , (ip T — 
An),4>t and (<p T — A) by their derivatives, and for e(S T — Sa n ), we replace 
the first slot arguments (<f) T — A), A, (ip T — An) and An by their right-tail 
integrals. 

Consider, for example, the first term (<j) T — A) oip T . Use the abbreviation 
D^ijj to denote any of ip',ip or eip. Then we have the bounds 

(207) \D {k \(j) T - A)\ < CiV~ 2/3 e- s/4 , \D {k ^ T \ < Ce~ s . 
We apply Lemma 7 and Corollary 1 with a(s) = /3(s) = 1 and with 

a = CN~ 2 / 3 , b = C, ai = i h = 1. 

The argument is entirely parallel for each of the second through fourth terms. 
Thus, if Dij denotes any matrix entry in any component of 8 RtI , we obtain 

(208) HAjll < C^AT-S/Sg-Sso^+TNI. 
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Table 4 





ato 


Oil 


1 


1 





Qn( s ) 


2e- CTS ° 


(T 


(e N - l)g^(s) 


CN~ 1 e- aso 


a 


(«£(«)-!) 


Q]\[-' 2 / , i e -(<r + 6)s a 


a + 5 



§R,D f erm _ Decompose K R = K R,C + K R)1 into "convolution" and "rank- 
1" terms 

K^ c = LS T , K^ 1 = \L^ T % e*l> T ), 

respectively. Correspondingly, in the following telescoping decomposition, 
we have 

6 R > D = (e N - l)Q N (s)K?Qx\t) + (Q N (s) - I)K?qfc\t) + K?{Q~ N \t) - I) 

_ fiR,D,c _|_ fiR,D,l 

The elements of the component terms of 5 R ' D,C are all of the form a(s)(3(t)(ao 
b)(s,t). In order to apply Lemma 7, we verify conditions (201) and (202). 

Since LS T = {J~ T g ~^f T ) s the terms a o b are all of the form 

D( k) <p T oD m ip T or D (fc) Vvo£> {O 0T, k = 0,-1; 1 = 0,1. 

All functions D^4> T and D^ip T satisfy (202) with a = b = C and ai = 
h = l. 

Inspecting the decomposition above, we see that the multipliers a(s) and 
(3{t) are chosen from the list q%(s), (e^ — l)q^(s) or (q^(s) — 1). To develop 
bounds for <?at(s) and qjj (s), note that c(a,b) = cosh(a + 6)/ cosh a satisfies, 
for all a and b, 

c(a,b) and l/c(a, 6) < 2e |b| , 

(209) 

|c(o,6)-l| and \(l/c(a,b)) - 1| < 2be W . 
These inequalities are applied to qat(s) = c(/x + crso, cr(s — so)), yielding 

(210) \q N (s)\ and |^( S )| < 2 e ^" s °), 

(211) |gjv(«)-l| and \q^ l (s) - 1| < 2a(s - s )e^ 3 ~ s °\ 

As a result, we may collect the bounds for ao and a\ (resp., /3q and f3±) 
in (201) in Table 4. 

In the last line, we have used the bound s — sq < S~ 1 e s ( s ~ s °^ for all s > sq, 
where 5 can be chosen arbitrarily small. Consequently, C depends on 5. 
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Denoting by Dij any matrix entry in any one of the terms of S R,D ' C , we 
therefore have from Corollary 1 

(212) \\D i:j \\ < CjV-2/3 e -2s () +7|so|. 

(Note the cancellation of terms of form asQ or (a + 5)so in the exponent.) 

Many of the remaining steps will involve matrices of rank-1 operators 
on L 2 (p) ® L 2 ( / o~ 1 ). Henceforth, we abbreviate the L 2 norms on L 2 (p) and 
L 2 (p~ l ) by || • ||_|_ and || • ||_, respectively. Let us record, using the remark 
leading to (199), the bound 

<S) 611 [|i ||ai2 <S> &12II2 ^ K ^||aii|| + ||6ii||_ ||ai 2 || + ||&i2||+ 



021 (S> &21 1|2 ||«22 <8> &22||l / V [| «2X (I — ] I i>12 II — ||«22 ||- ||&22 || + 

(213) 

On the left, || • ||i and || • H2 denote trace and Hilbert-Schmidt norms. Indeed, 
apply (199) to aij <g> bij : L 2 (pj) — > L 2 (pi), where pi= p and pi = p~ x . 
Turning now to the S 11 ' 0,1 term, and observing that 

* Tj \-e(/) T ® £lp r eip T ®<t> T 

we see that every term in 5 R ' D ' 1 is of the form adStb, where 

a(s)b(t)=£ j (s)C J (s)Ck(t)h(t), 

where Cj( s ) and Ck{t) are chosen from the list {cp T , e<p T , ip T , eip T } and £j(s),£k{s) 
are chosen from the rows of Table 4, with the conventions that j and k indi- 
cate rows and that one of j, k equals 1 or 2 and the other equals 3 or 4. If we 
abbreviate the bounds summarized in the table by |^j(s)| < CjNe~ l: > So e ljS , 
and then use Corollary 1, we obtain 

e 2l > s - 2s p ± (s)ds 



< CC 2 N e~ 2ljS °e~ 2( ' 1 ~ lj ^ So+ ^ So ^ 



so that 
(214) 



||a®6|| < ||^0llp±H4Cfc||p± 

< C N~ 2 ^ e~ 2s ° + ^ Sa][ . 
Applying these bounds to L(a®eb), we obtain 

(215) \\L(a®eb)\\<{^ B B _ 
where 

A + = \\a\\+, B+ = \\b\\ + , 
A_ = ||ea||_, B- = \\eb\\- 
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For 5q write 

-28$ = L[cf> T <g> e(ip r - A N ) + (4> T -A)® iA N ] 

so that we may apply (215), first with a = 4> T , b = tp T — An and then with 
a = 4> T — A, b = An- 

In the first case, we have from Corollary 1 



/OO POO 
<t? T p<C 2 , 
-0 "' s 



'^+l\s\ ds 



< _ (J 2 g-2so+7|so| 



2-7 

with the same bound applying also to A 2 , . In a similar vein, 
Bl = \\ety T - A N )\\l < C 2 N~^ / e -/*M'l ds 

JSQ 

< 8 c 2 N -4/3 -s /2+y\s \ 
~ 1-27 

with the same bound also for B 2 :. Hence 

(216) A±B± < C( 7 )Af~ 2/3 e- 5so/4+7|so1 . 

The same bound works for the second case, with a = 4> T — A,b = An, as well. 
Sf term. We have 



26f 
2(*f)* 



wjvi ® ^tv 1 ~~ A (8> 1 

— UN2 1 

q^ 1 (g> li/V2 - 1 <8> £A 

c^ 1 <g> wjvi - 1 <8> A 



with 

UNi=iNqN<t>T, u N2 = 'yNq N 1 £(j>T, in = ^nPn-i- 

Using (213), we find that the norms of the first column of Sf are bounded 
by 

\\um - ^Il+H^ 1 !!- + pn+ll^ 1 - 1||_ 

u N2 - eAW-Wq^W- + HeAII-ll^ 1 - 1||_ 

while the norms of the second column of (&f )* are bounded by the same 
quantities, with the rows interchanged. 
From (210), 

roo 

Wqn 1 II- < 4 / e 2a{s - so) -^ sl ds < c e -2^o-(7-2a) S0+ ; 
J s 
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(217) ll^ll-^ 6 " 7 " 72 ' 



f CN~ 2 / 3 e-^ s °/ 2 , s > 0, 

{ CN -2/3 e -(*+S)s ^ Sq<0 



so that 

Ce-™ , s <0. 

Using (211), 

POO 

hN 1 - 1|| 2 _ < C(5)a 2 e^ +5 >° / e 2 ^ 5 ^! 5 ! da, 

so that 

(218) ||g^-l||_< 

Using (158) for 4> T and (211) for — 1, 

||( gjv _ l)0 r ||2 < C , (T 2 e -2(a+5) S0 / e -2(l-«T-*).+7|.| 

so that, using Corollary 1, 

\\{q N -l)(j) T \\ + <Cae- S0+ ^l 2 ^ S0 \. 
A similar argument gives 

lk;v<M+ < c e - so+ ^/ 2) i s °i. 

We have 

(219) ||ujvi " A\\+ < |77V - l|||gjV0r||+ + llfev " l)0r||+ + Il0r - A\\ + . 

First, we show that |^/jv — 1| = 0(N~ 1 ). For e^, refer to (138). For (3n-i, 
we exploit (196) and Lemma 9 (noting that N + 1 is even) to write 

PnLi = C 2 0"A r -1^7V-l a A r -l(l + £ n) = CTN-l^N-l^N-^N-li^ + £jv) 

(220) 

= l + e N , 

where we have used a = cn^n = ctv-i<^v-i(1 + en) and then (139). 

Assembling the bounds developed just above decomposition (219) along 
with (207) yields 

- A\\+ < (CN~ l e~ S0 + CN^e' 80 + CAT- 2 / 3 e - s °/4) e 7M/2 

and 

(221) \\u m - ^H + lk^ll- < CiV- 2 / 3 e -( 1 /2+7) S o/2_ 

With a similar decomposition, 

||wjV2 - < \lN - lHl^e^rll- + Uqn 1 ~ l)e<M- + \\£{4>t - A)||_ 

and these terms are bounded exactly as are the corresponding terms in 
IkiVi - 
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Finally, observe that 

poo 

\\eA\\ 2 ± <C 2 e - 2s +^ s l ds < C( 7 )e- 2so+7 l so l, 

J Sn 



so that 

5 s term. The only nonzero entry in the S e term is 

#21 = [e N qN 1 (s)q]^ 1 (s) - l]e(a - t) 

= [(ejv - ljg^s)^*) + (g^W " + " " *) 

= (h +e 2 +e 3 )(s,t). 

Each of ij(s,t) has the form e a b(s,t) = a(s)b(t)e(s — t). We regard e a b as an 
operator mapping L 2 (M.,dp,(t) = p(t)dt) — ► L 2 (IR, dp'(s) = p~ 1 (s)ds), thus 



0W)(s) = y [a(a)e( S - t)b(t)/p(t)]f(t) dp(t) 
so that K(s,t) in (200) has the form e a b(s,t)/ p(t), and 

IMIhs = I / / a 2 ( S )6 2 (t) /3 - 2 (t)^ , (s)d / u(t) = i||a|| : 



Hence 

||<5|i||h5 < |ejv - lllkiv 1 !!- + Ik^ll-lkiv 1 - 1||- + Pll-lkjv 1 - 1| 
and from (138), (217) and (218), it is bounded by 



(223) 



/ CiN- 1 + iV- 2 / 3 + AT-2/3) e -7so j s > 0, 

1 CiN- 1 + iV- 2 / 3 + AT-2/3) e -(2 CT + 7 ) S o ( SQ < 0. 



We remark that the exponential right-tail bound here is possible due to the 
assumption (203) on the weight function p. 

At last we can assemble the bounds obtained in (208), (212), (214), (216), 
(221), (222) and (223). Each term has a component CN~ 2 / 3 where C de- 
pends on -y,5 and of course a(N)/N and (3(N)/N. We only track the tail 
dependence on sq for sq > 0. With regard to that dependence, (214) is dom- 
inated by (208), and so (206) is bounded by 

(7Af~ 2 / 3 (e~ 5so//4+7So + e - (1+27 ' )S0//4 + e~ s ° + e _7S °). 

It remains to choose a suitable value of 7; it is clear that 7 = i yields a 
bound CiV -2 / 3 e -S0 / 2 . [The choice of 7 could be further optimized, but this 
is perhaps not worthwhile until best bounds are found on the exponential 
rate in (160)-(161).] 
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8.4.3. Summing up. 

Theorem 4. The previous subsection established that 

\F N+1 (s ) - F 1 (s )\ < CN^e~ s °l\ 

with the constant C depending on sq when sq < 0, and, referring to (178) 
and recalling that x = r(s) = tanh(/x + as), we have 

F/v + i(s ) = P{r _1 (x (1 )) < s } = P{(tanh _1 x^) ~ p)/ a ^ s o}- 

The JOE(N + l,a,P) setting is linked to the JUE(N,a,/3) via (50), and 
through equating \x = un and a = tjv- The it-scale centering and scaling 
values are related to their x-scale versions via (133): 

un = tanh -1 xn, t~n = cjv/(1 — 

Finally, on the x-scale, we have from (76) and (112): 

/ , \ 3 2sin 4 (y3 + 7) 

x N = -cos((^ + 7), a N = -^—. ; . 

k n sin sin 7 

Hence, for all \i we arrive at 

,_i 1, 1 + zjv 1, 1 - cos(9? + 7) 

l_ l = UN = tanh x N = - log = - log — — ■ — - 

2 1 — xn 2 l + cos((/? + 7j 

(224) 

= logtan(c/? + 7)/2 



and 



(225) a 6 = t n 



3 _ _3 _ a N 



(1 — x N )' 3 k 2 n sin 2 (ip + 7) siny3sin7 



Theorem 1. While Theorem 1 is just a relabeling of Theorem 4, it may be 
useful to collect the parameterizations and formula leading to the centering 
and scaling expressions (5) and (6). 

First we identify the double Wishart setting of Definition 1 with the appro- 
priate JOE. Identification (27) was made under the additional assumption 
that n >p. If n < p, we use identity (2) and density (7) with parameters 
(p',m',n') = {n,m + n—p,p). In either case, then, we use JOE(N + l,a,f3) 
with the identification 

'iV+l\ / pAn 
a = ?n — p 
(3 J \\n-p\ 

Noting that |n — p\ = p V n — p A n, we have 

k n = 2(N + 1) + a + (3- 1 = m + n- 1, 

a + f3 = m + n — 2{p An), a — (3 = m + n — 2{p V n). 
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Thus, the (20) defining 7 and ip become 



69 



a + (3 1 
cos 7 = = 1 



cos If 



K 

a — f3 



2 (p An- 1/2) 
m + n — 1 

2(pVn-l/2) 
m + n— 1 



which yield the half-angle forms (6). 
Recall that = (1 + Xj)/2 so that 



log- 



log 



1+Xi 



2 tanh 1 Xi . 



1-0, ~°1-Xi 

Thus fj, p = 2(1 and a v = 2a, and so we recover (5) from (224) and (225). 

APPENDIX 

A.l. Proof of Lemma 1. From (57) and (58), respectively, 
h N (N + a)(N + (3) 2N + a + (3 



1 

h N _i ~ N{N + a + (3) 2N + a + (3 + l 

l N -i _ 2N(N + a + (3) 

l N ~ (2N + a + (3){2N + a + (3- 1)' 



and 

whence 

(226) aN ~"[ K{n-mn-2) 

Now use the (a, b) parameters defined at (63)-(65). We have, for example, 
(JV + a)/K=(l + o-l/(2iV + ))/(2 + o + 6), so that 



N(N + a)(N + f3)(N + a + (3) 



nl/2 



a N 



;i + a)(l + 6)(l + a + 6) 



1 1/2 



(2 + a + 6) 4 

From (68) and (69) follow 

sin 2 (p _ (l + a)(l + 6) 
4 (2 + a + 6) 2 ' 

and now (86) is immediate. 



sin 2 7 



[1 + OiN- 1 )}. 



l + a + b 
(2 + a + fe) 2 ' 



A.2. Choice of / and gi in (74) for LG approximation. We elaborate on 
consequences of the key remark that the 0(1/ k) error bound (94) is available 



only if the integral V(() = J ( 



C 



))v 1 / 2 | dv < 00. In view of Remark A, 



we consider convergence at both endpoints, corresponding to a - 
as 6 = 1. 



- 1 as well 
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We refer to arguments in Olver (1974) to show that these convergence 
requirements lead to the specific choices of / and g made in (74). Indeed, 
remarks in Olver [(1974), Section 11.4.1] show that it suffices to show finite 
total variation — > ±1 of the error control function 

F(x)=Jj-^^ 2 {r^)- g r 1/2 d X 

associated with the LG approximations of Olver [(1974), Chapter 6.2] at 
both endpoints. In turn, the discussion of Olver [(1974), Section 6.4.3] shows 
that this is obtained if 

fo = lim(c — x) 2 f(x) > and go = lim(c — x) 2 g(x) = — 1/4, 

for both endpoints c = ±1. 

Return to (70)-(72) and write 

= u 2 f(x)+g(x), 



4(1 -x 2 ) 2 4(1 -x 2 ) 2 

where F(x) and G(x) are quadratic polynomials which clearly determine / 
and g. The requirements on go imply that G(±l) = —4. The coefficient of x 2 
in n(x) is (2N + a + (3 + l) 2 — 1 = k 2 — 1. If, for convenience, we take F(x) to 
be monic, then it is natural to set the large parameter u = k. The condition 
on /o follows from the fact that the zeros x± of n(x) lie in the interior of 
[— 1, 1]. Further, G"(x) = —2, which with the two previous constraints implies 
G(x) = — 3 — x 2 . Since n(l) = 4a 2 —4 and n(— 1) = 4(3 2 — 4, we conclude easily 
that F(l) = 4a 2 /k 2 = 4A 2 and F(-l) = 4/3 2 / ' k 2 = 4/i 2 , thus arriving at the 
expressions (74) for f(x) and g(x). 

A. 3. Error control function. Clearly, for xo < x < 1 we have V(C(x)) < 
V(C(^o))> and it is our goal here to show that V(((x);A,fi) is uniformly 
bounded for (A, fx) £ Dg- 

We first observe that 

V(ax )) = V [xOjXl] (H)+V [xi>1] (H), 

where xq = \{x + + xJ) is defined before (96) and x\ > x + will be specified 
below. We then note that since H'(x) = — C|CI _1//2 V'(C)) we have 

v (m- f c{xi) MM at 

Our approach is to use the fact that (£, A,/i) — > tp(C] A,/x) is continuous, and 
hence is bounded, by M(5) say, on the compact set of (£, A,/i) for which 
both C S [C(xq(A,/x)), C( x i)] an d (A,/i) G D$ — we use here the continuity 



LARGEST EIGENVALUE IN MULTIVARIATE ANALYSIS 71 

of xq in (A, /i) and of £ in cc. For (A, /i) in .0,5, there exist finite bounds 
C_(<5) < C(a^o(A,/i)) and C+(<5) > x i so that 

V N)a:i] (iI) < M(<5) / C+(5) Id" 172 dC < Mx(<5). 

JC-(8) 

Continuity of A,//) is a consequence of Olver (1974), Lemma 11.3.1, 
and the continuous dependence of / and hence ( on (A,/u). Indeed, the 
lemma uses the decomposition 

f(x;\,n) = {p(x;X,fi)} 2 {^q{x;X,fi)}~ 2/3 , 

where 

p(x) = (x-x-) 1/2 /(2(l-x 2 )) and 
q {x) = (x- x + )~ 3/2 P (t - x + ) l l 2 p(t) dt 



and as x — > x+, q(x) — > |p(x+). For (A,/i) G D$, we have > 5^(5) > 0, 

and so the continuous dependence of x_ and x+ on (A,^i) carries through 
to ip. 

Turning to V [xijl] (H), note from Olver (1974), (11.4.01), that H(x) = 
F(x) + (5/24)C~ 3/2 , so that 

V [xul] (H)=V [xiA] (F) + l 4 C( Xl r^ 2 . 

Since C( x i) 1S bounded below on D$, it remains to bound 

V M (F)= C\C{x)\dx, 



■i-i 



where £ = /-i/4(/-i/4y/ - gf~ 1/2 . 
To organize the calculation write 

f(x) = (1 - x)" 2 f(x), f(x) = ^ + (1 - x)f!(x), 



fib) 



4 

(l + x)(A 2 -l) + 2/x 2 



4(1 + x) 2 

g{x) = (1 - x)" 2 g(x), g(z) = -i + (1 - x)gi(x), 

gl(x) = 2(l+x) 2 ' 
from which one obtains 

r l/4 (/ -l/4 r = _l (1 _ x) -l f -l/2 + ^ 



72 



I. M. JOHNSTONE 



where 



4 



— — (1 — x)<— T T 



f" 5 /f 



f 4 V f 



/\ 2 



Since we have arranged the decomposition k 2 / + g precisely so that g 
— \ + (1 — x)gi, the (1 — x)~ l term cancels and 



c = r 1/ \r l/A )"- 9 r 



1/2 



-r 1/2 gi+#. 



Consequently, to show that C is bounded uniformly over D$ on [xi, 1], it is 
enough to choose x\ close enough to 1 so that 

inf f {x) > 5 4 (S) > 0, 

where the infimum is taken over x £ [a;i,l] and (A,//) 6 D^. And indeed, 
then |gi|,|f'| and |f"| are uniformly bounded for such (x,X,/j,). 

A.4. Behavior of LG transform as x — > 1. Write the leading term /(x) 
of (74) in terms of 



(227) 
Thus 
(228) 



R(x) 



2(1 -x 2 ) 



/(*) := (2/3)C 3 / 2 = \ 



R(x) = \J {x — x + )(x — X-). 
R(x') 



1 — x 



12 



dx'. 



Proposition 4. Let N be fixed. As x — ► 1, 
x ' 2a 



(229) 4J(x) =4 r Jf(x')dx' 
where 
(230) 



2 + a + 6 



log(l-x) 1 + c at + o(1), 



coat 



2 + a + 6 



log 



(2a 2 ) a (l + 6) 1+fe 



(l + a) 1 + a (l + a + b) 1+a+b 



[Recall that the turning points x± of (75) and (76) are related by A, /i of 
(63) to a, b defined at (64).] 

Proof. To ease notation, introduce new variables s = 1 + x and t = 
1 — x, which are both positive for \x\ < 1. With slight abuse, we set 



(231) 
(232) 



R(s) = y / (s-s+)(s-s_), 
i2(t) = J(t-t..)(t-t + ), 



s± = l+x±, 
t± = l- x T , 
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and note that R(s) = R(t) = R(x) for \x\ < 1. Consequently, 



(233) 



Js, s' J t 



R(t>) 



t> 



dt'. 



The following "elementary" indefinite integral formula may be derived from 
Gradshteyn and Ryzhik (1980), 2.267, 2 261 and 2.266. Le t u denote any of 
the variables x, s or t, so that R(u) = \JJu — u + )(u — uJ) and set 



(234) 
Then 



u = (u+ + tt-)/2, 



u = U+U- 



R(u) 



u 



du = R(u) — ulogju l \uu — it 2 — uR(u)\} — nlog \ u — u + R(u)\. 



Before using this to evaluate the definite integrals in (233) note that 

(235) sgn[uu — v 2 — uR(u)] = sgnfnii — u 2 ], 

(236) sgn[u — u + R(u)] = sgn[n — u] 

and all four quantities are positive if u > u+ and negative if u<u-. [For 
(236), let A = (u + - u_)/2 and observe that (u - u) 2 - R 2 {u) = A 2 > 0, 
while for (235), note that 

(uu - u 2 ) 2 - u 2 R 2 {u) = u 2 A 2 > 0.] 

As a result, we have 



ds' = R(s) - slog 



s + ss 
s 



sR(s) 



SS+ 



s log 



s-s + R(s) 



with an analogous expression for the integral in t in (233), namely 



R(t' 



dt 1 



-R(t)+t\og 



t- iR(t) + 1 2 
1 P - it. 



ft 



+ tlog 



t-t- R(t) 



t-t. 



t + — t = x + — x. Adding the 



Define 8 through the equations 5 1 = s — s 
two previous displays yields 

(237) 4I(x) = -a log^s" 1 (ss - s 2 - sR(s))} - slog{5(s - s + R(s))} 

(238) + 1 logiSt^iiRt + P- it)} + i\og{5(i- t - R(t))}. 



As x y 1, we have s/2 and t \ 0, and so, noting that R(l) = y/t+t- = t, 
we obtain AI(x) = tlog£ _1 + cov + o(l). This is the desired approximation 
(229), with 



(239) 
where 
Ti = \8{2s- 



con = -slogTi - slogT 2 + +tlogT 3 + ilogT 4 , 
st), T 2 = 5(i+t), T 3 = 2dP, 



Ti = 5{t-i). 
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To convert the previous expression for coat to that given in terms of a, b 
in (230), we proceed via the angle parameters 7, y of (66). In preparation, 
we set out, in parallel for the s and t variables, some relations that follow 
from the definitions, the fact that x± = — cost/? cos 7 ± siny sin 7 and some 
algebra: 

(240) s 2 = s+s_ = (l + x+)(l + x_) P = (1 - s+)(l - a?_) 
= (1 — COS (/?COS7) 2 = (1 + cosy cos 7) 2 

(241) 

-2-2 -2-2 

— sm y sm 7 — sin y sin 7 

(242) = (cos 7 — cos ip) 2 , = (cos 7 + cosy) 2 , 
so that 

(243) s = cos 7 — cos ip = 2/i, f = cos 7 + cosy = 2A, 

(244) s = 1 — cosy cos 7 = 1 + x, i = 1 + cos y cos 7 = 1 — x 
and, since s + t = 2 cos 7, 

(245) s — s(s + i)/2 = 1 — cos y cos 7 — (cos 7 — cosy) cos 7 = sin 2 7, 
while 

t + t = (1 + cos7)(l + cos y), t — t = (1 — cos7)(l — cosy). 
From (68) and (69) for cos 7 and cosy, 

1±COS7 = (1 + a + 6)^, 1±COS ^ = f 
sin 7 siny \l + o/ 

and since (78) shows that <5 -1 = siny sin 7, we find that 



5( - ±tV l±cos 7 l±eosy 



x 1 + CL 



±1/2 



sm7 smy 
Thus T4 = I/T2, and so 

-slogT 2 + t logT 4 = log[(l + a)- x (l + b)' 1 ^ + a + 6)" 1 ]. 
Using now (245) and (240) and similar trigonometric manipulations, 
sin 7 \/l + a + b 



siny a/1 + «Vl + &' 

2(cos7 + cosy) 2 2a 2 



(246) Ti 

(247) T 3 - 

sin 7 smy vl + a + by/1 + ayl + & 

so that 

-slogTi + t logT 3 = -/xlog[(l + a) _1 (l + + a + 6)] 

+ Alog[4a 4 (l + a)- x (l + 6)- x (l + a + by 1 }. 

We obtain expression (230) for cqn from (239) by combining the previous 
displays and computing 1 ± A =t \i using (65). □ 
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A. 5. Identification of c/y We first remark that as £ — > oo when x — > 1, 
we may substitute the large x behavior of Ai(cc) given by Ai(x) ~ [2-y/vrx 1 / 4 ] -1 x 
exp{-(2/3)x 3 / 2 } to obtain 

(248) w 2 {x, k) ~ [2v^] -1 k -1/6 /~ 1/4 (^) exp{-(2/3)< 3/2 }. 
Consequently, using (228), we may express cn in terms of the limit 

cn = lim wn(x) ■ 2\/ttk 1 ^ ^^(x) ex.p{Kl(x)}. 

Consider first the dependence on x. From (70) and (59), we have as x — ► 1, 

(249) w N (x) ~ ^(1 - x)^l\ WN = 2^1* ( N ^ a )- 

Since R(x) -► = t = 2A [compare (240) and (234)], we have from (227) 
that 

fV\x)~^r2{l-x)-V\ 

while Proposition 4 along with (65) and (63) implies 

exp{ K /(x)} ~ e KC0N/i (l - x)~ a/2 . 

Multiply the last three displays: the resulting exponent of (1 — x) is identi- 
cally 0. Consequently 



(250) c N = w N -2^K 1 ^^X/2e KC0N/i . 

Using [ ] to denote the quantity in brackets in (230), we have 

■(l + a) 1 + a (l + 6) 1 + 6 l Ar +/ 2 



3 Kc 0i v/4 _ r 1AT+/2 _ 2 a/S 



a a 


N+ - 


.(l + a) 1+a . 





{l + a + b) 1 + a + b 



Lemma 8. Let N + =N + l/2,a = N + a and (3 = N + b. There exist bounded 
remainders 9\{a) and 02(a,b) such that 



(251) 

and 

(252) 



N + a 
N 



1 



^2irN + a 



(l + a) 1+a ] N + \e 1 



(N + a)\(N + (3)\ 
N\(N + a + {3)\ 



(l + a) 1+a (l + b) 1+b 



(l + a + 6) 



l+a+b 



eXP {l}' 
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Proof. Stirling's approximation x\ = y/27re~ x x x+1 ^ 2 e 9 ^ x has < 8 < 
1/12. Consequently 

(N + a)\ 
N\a\ 

1 [N + (l + a)-l/2]W+ a ) 

= [N + - 1/2]"+ ex P{ g (")/^} 



1 



V2vr7V + a 



l+a 



+ (l-l/[2iV + (l + a)]) JV +( 1+0 ) 



1 - 1/[2N+]) 



A 4 



exp{#(a)/iV}. 



The result (251) follows from the relation (1 — v/N) N = exp{— v + 8v 2 N 1 } 
where < 8 < 1 for N > 2v. The argument for (252) is completely analogous. 

□ 

Combining Lemma 8 with (249) of wn, 

wn V 2ira / 
Substituting this into (250), we finally obtain (99). 

Lemma 9. 

(253) J 1 4> N = 2{K N a N )- 1/2 {l+e N ) for N even 

and for N odd. 

Proof. From Nagao and Forrester (1995), (A. 7), we have 
J' (1 - aO (a ~ 1)/2 (l + x)^' 2 P^{x) dx 

(254) 

= 2 (a +l 8)/2 T{(N + a + l)/2)r((iV + 13 + l)/2) 



T((N + a + P+ l)/2)r((JV + 2)/2) 



if N is even, and zero if N is odd. [Identify our parameters (N, a, (5) with 
NF's (n, 26+ 1,2a + 1) after noting that they use the opposite convention 
for Jacobi polynomial indices: our is their p j ( 2a+1 > 2b+1 ) _j 

— 1/2 

The function 4>n equals h N times the integrand of (254), and so after 
combining this integral with expression (57) for hp?, we obtain 

1 ~ f k ] 1/2 r(N + a)r(N + (3) 

4>n — 



1 



2{N + l)(N + a + f3 + l) 



r(N + l)r{N + <* + /? + !)' 
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where 

(S56) r((Ar +1)/ 2) = /|,y/ Vwv/Ar 

v/F(]vTi) \nj 

with the last equality following from Stirling's formula as in the proof of 
Lemma 8. Substituting (255), we further find 

1 4> N = V2^[N(N + a)(N + P)(N + a + + l)]" 1 /*^* 



= 2(^a JV )- 1 / 2 (l + e JV ) 
after exploiting (226). □ 

A. 6. Proof of (128). We show that S T has the same eigenvalues as SV- 
Indeed, suppose that g G L 2 [xq, 1) satisfies Sjyg = A<?. Set 

(256) h(8) = yf^jg(r{8)). 

First observe that 

oo roc rl 

h 2 {s)ds= / g 2 {T{s))T'{s)ds= / g 2 {x)dx, 

•'So ^Xq 

so that h € L 2 [sq, oo) if and only if g £ L2[xq, 1). In addition 



{S T h)(s)= S T {s,t)h{t)dt = Jr>{s) S N (r(s),T(t))g(T(t))r'(t)dt 

Jsq Js 

= V^W /' 5jv(t(s), y)<Ky) = y^A^s)) = A/ l ( s ). 



A. 7. Proof of bounds for ax<i4>' N {x). 

PROOF of (109). Differentiate (106) to obtain 

a N (j)' N (x) = a N e N r' N (x)[Ai(K 2/3 C) + e 2 (x,k)] 
(257) + e N r N (x)[A['( K 2 / 3 ()a N K 2 / 3 <:(x) + a N d x e 2 (x,n)} 

= Dm + D' N1 . 

Using (104) to rewrite aN^ 2 ^C as C/Cn, we further decompose the difference 
<TN(j)' N (x) — Ai(sjv) as Ya=i Dm, with the new terms given by 

D N2 = [e N r N (x) - l}l(/( N (x)}Ai'(K 2 / 3 (), 

D N3 = [C/Cn(x)-1]M'(k 2 ^C), 

D m = Ai'(K 2 / 3 ()-Ai'(s N ), 

D N 5 = e N r N (x)a N d x e 2 (x,K). 
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We first observe from (94) and the uniform bound on V that 
(258) | Ai( K 2 / 3 C) + e 2 (x, k)\ < CM{k 2 / 3 QE- 1 {k 2 / 3 C) 

and that, using (117) and (113), 



' N 



(x) 



2 V (n 



Cn 



<c. 



As a result, combining the two previous bounds with the argument used for 
Ens, w e obtain 



(259) 



\D m \ < a N \(r' N /r N )(x)\ ■ r N (x)M(^ 3 QE- l (^ 3 C) 
< Ca N e~ SN < CN~ 2/3 e~ SN/2 . 



Before turning to D^2 and -Djv3i we first remark, using | Ai'(x)| < N(x)E 1 (x), 
that on [sLiStN 1 /®], 



(260) 



AiV/ 3 C)| < N{k 2 I 3 QE- 1 {k 2 I 3 Q < c4 /4 e"^. 



Indeed, we bound N(k 2 / 3 C) by using N(x) < C\x\ 1 ^ and (114) to conclude 
that |k 2 / 3 C| < 2s N . The bound for E~ 1 {k 2 / 3 Q uses Proposition 3. 
Combining (116), (117), (260) and (115), we find 

\D N2 \ < C(l + s N )a N ■ 2 ■ Cs]i 4 e- SN < CN' 2 / 3 e~ SN/2 , 
\D N3 \ < Cs N a N ■ Cs]i 4 e- SN < CN- 2 / 3 e~ SN ' 2 . 

-Dat4 is treated in exactly the same manner as the E^ 2 term above, addi- 
tionally using the equation Ai"(x) = x Ai(x). 
Using (95), we can rewrite Dn5 as 

\D N5 \ < CeNK^ 1 ■ r N {x)a N 4V 1/2 (x) ■ N( K 2 / 3 C)E'\k 2 / 3 0. 

From (110) and (104), we note that cjatk 2 / 3 f l / 2 (x) = ((x)/(n and in com- 
bination with (117) 

(261) r N (x)a N 4 3 f 1 / 2 (x) = [C(x)/Cn} 1/2 < V2 

on [sl, siN 1 / 6 ]. Bringing in (260), we conclude 

\D m \ < Ck^ 1 -V2-Cs]{ 4 e- SN < CN~ 2 / 3 e- s "l 2 . 
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Preliminaries on rjy and r' N . Starting from (107), we find that 
lfC(x)Y 1/2 C(x) and r' N {x) _ l((x) 
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r' N {x) 



2 V C 



N 



C(x) r N (x) 2f( x ) 

Writing I(y/~f) for f y/f, taking logarithms in (96) and differentiating 
yields 



(logC)' 



2 Vf 



3J(v7) 

From this one readily finds that 

C (logC)" 



and 



(logC)" if Vf 



(logC)' 2/ J(v7)" 
1/' 1 v 7 / 



C (logC) 7 + (1 ° SC)/ 2/ 3/(^7)- 

By straightforward algebra and bounds on both /'// and \/~f / I{\fJ), one 
can check that for x > cc+, 



< 



(cc — x + )(l — X 2 ) ' 

where C depends on x + and x_. Recalling that t'(s) = cr(l — r 2 (s)), we 
have, for x = r(s) = xat + ctatsat(s), 



(262) 



r'( S ) 



r^(x) 



rAr(x) 



< 



Ca 



a N s N (s) 



< CN- l ' & 



since for s > siiV 1 / 6 , we have sat(s) > CiV 1 / 6 . 

Bound for 4>' T (s). The differentiated function $-(s) = 4>' n (t(s))t' (s) = 
Dni(s) + D' N1 (s) may be written in the form (257) with o^r replaced by 
t'(s). The analog of (259) is 

r jv( x ) 



l^i( S )|<r'( S ; 



rjv(x) 



rjv(x)M(K 2/3 C)£ _1 (K 2 / 3 C). 



On [sijSiiV 1 / 6 ], this is bounded by CN-^ 3 e~ 8N ^ exact l y 

as in the local 

bound case. For s > siiV 1 / 6 , we use (262) together with (162) and (163) as 
above to get 

\D N1 (s)\ < CN~ 1 ^ ■ c /Vr~ ■ Ce- S < Ce~ s . 
Using (95), (110) and the uniform bound on V, 

\D' N1 (s)\ <e N r N (x)T'(s)[Ai'(K 2 / 3 ()K 2 / 3 ( + \d x e 2 (x,K)\] 

<e N r N {x)T , {s)K 2/3 f 1/2 {x)N(K 2 / 3 C)E- 1 { K 2 / 3 C). 
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From (261) and (107), 

r N {x)r'{s)^f l /\x)= ( J N l T l { S )r N 1 {x). 
Using the iV-asymptotics from (93) and then (107), we have 

r- N \x)N{^0 < Cr- N \x)K l lfC l/i = C[K N a N ^ff(x)f 2 . 
Using x = tanh(ii7v + tjvs) and (154) along with (155) and (104), we find 

KN<TN\ff(x)< K N af[s + e N (s)]^ 2 /(l - x 2 ) < Cs^ 2 /(1 - x 2 ). 
Since r'{s) = a[l — t 2 (s)], we conclude that 

a N 1 r'(s)r N 1 (x)N(K 2 / 3 C) < Cia/a^s 1 ^ 

and hence that 

\D' m (s)\ <Cs x ^e~ s <Ce~ s l 2 . 

Bounds for <p' T {s) — Ai'(s), ip' T (s) — Ai'(s). Again, the real work is on 
[si.siiV 1 / 6 ]. For #.(s), since \i = un , a = tn, the bound needed is already 
established at (153). For tp' T (s), we follow the approach taken for tp T (s), 
differentiating ip T (t) = (j)N~i(uN~i +TN-it) to yield 

ip' T (t) = tn-i4>' n _ x (un-i + T N -it')(dt'/dt) 

= [Ai'(t') + 0(iV- 2/3 e-' /2 )] [1 + O(N^)}, 

using (153) and dtf/dt = rjv7" i ^_ 1 = 1 + 0(N~ 1 ). We now argue exactly as 
at (171) and (172), increasing each order of derivative by one. Since Ai"(i) = 
t Ai(t), we nevertheless obtain the same bounds as before, so the proof of 
(161) follows. 
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